The story begins when I saw a post in Numenta’s forum. Making an AI (or, ML algorithm) play rock, paper, scissors against a human player and learn the human player’s pattern. And by that, beating dominating the human. Than I thought to my self. Why not to write 2 AI. One in HTM, one using LSTM. And have them compete?
The idea is to have 2 AI players. One implemented in HTM and the other in RNN (or LSTM/GRU). Let them learn and predict their opponent’s next move then act accordingly. Sounds fun and straightforward enough.
The RNN Agent
I’ll be using the broad definition of RNN – being a NN layer that has a hidden state coming from previous timesteps in this post unless specified otherwise.
Agents in this game is expected to do 2 things. First, learn the opponent’s pattern. 2. Predict the opponent’s next move based of the opponent’s past move history. To keep things simple. I’ll be using my favoret DL framework – tiny-dnn. And the RNN agent will be a simple 2 layer network receiving an one hot encoded vector as input .
So, let’s define the network.
Then the tricky part…. The RNN is required to predict and learn at the same time. Yet due to tiny-dnn is a static graph library. The network can only be trained every 3 steps (the
seq_len parameter) . So for each step, we save the input to a
std::vector. Predict the opponent’s next move with the network with the current input. Than train the network every 3 steps.
The HTM Agent
The HTM agent is a lot straight forward. Recap from my previous project. All HTM layers receive a sparse binary tensor, or Sparse Distributed Representation(SDR) as input and generates a SDR representing what the algorithm learns. And TemporalMemory is a algorithm in HTM which learning to predict the next input based on input sequences observed in the past; exactly what I need.
So now, let’s create a TemporalMemory object and setup the hyper parameters.
3*ENCODE_WIDTH is how long the SDR the layer will receive. And
TP_DEPTH is in broad terms, how may different sequence that can potentially trigger an output.
To train and make use of the TemporalMeory layer. Simply call the compute() function. It performs the learning and prediction automatically.
Playing the game
Now I have the two agents ready. It’s time for them to play games!
Set the two algorithm to play against each other 200K times. Compile and run… Val la! Here comes the result…. There is way less draws than win/looses? I have tried multiple times with different parameters. It seems to be a consistent trend. Intresting
The code that lets the algorithm plays against each other is quite boring so I didn’t show it. If you are interested. The source code available here.
I don’t know what conclusion I can draw from this experiment… Theoretically both algorithm should be winning 33% of the time. However in fact both LSTM and HTM is winning around 38% of the time. I can’t find any explanation for this. Never the less, TemporalMemory is definitely a valid algorithm to learn and predict from sequences.