Building a bot playing Rock, Paper, Scissors using Hierarchical Temporal Memory being as good LSTM

The story begins when I saw a post in Numenta’s forum. Making an AI (or, ML algorithm) play rock, paper, scissors against a human player and learn the human player’s pattern. And by that, beating dominating the human. Than I thought to my self. Why not to write 2 AI. One in HTM, one using LSTM. And have them compete?

The idea is to have 2 AI players. One implemented in HTM and the other in RNN (or LSTM/GRU). Let them learn and predict their opponent’s next move then act accordingly. Sounds fun and straightforward enough.

The  RNN Agent

I’ll be using the broad definition of RNN – being a NN layer that has a hidden state coming from previous timesteps in this post unless specified otherwise.

Agents in this game is expected to do 2 things. First, learn the opponent’s  pattern. 2. Predict the opponent’s next move based of the opponent’s past move history.  To keep things simple. I’ll be using my favoret DL framework – tiny-dnn. And the RNN agent will be a simple 2 layer network receiving an one hot encoded vector as input . 

So, let’s define the network.

size_t hidden_size = 30;
size_t seq_len = 3; //The length of traning srquence
network<sequential> nn;
nn << recurrent(lstm(3, hidden_size), seq_len);
nn << leaky_relu() << fc(hidden_size) << softmax();

Then the tricky part…. The RNN is required to predict and learn at the same time. Yet due to tiny-dnn is a static graph library. The network can only be trained every 3 steps (the seq_len parameter) . So for each step, we save the input to a std::vector. Predict the opponent’s next move with the network with the current input. Than train the network every 3 steps.

xt::xarray<float> compute(xt::xarray<float> input)
{
assert(input.size() == 3);
//save data for traning
if(last_input_.size() != 0) {
for(auto v : last_input_)
input_.push_back(v);
for(auto v : input)
output_.push_back(v);
}
last_input_ = vec_t(input.begin(), input.end());
//Train once all needed data collected
if(input_.size() == RNN_DATA_PER_EPOCH) {
assert(input_.size() == output_.size());
//Set the netwotk into a "traning more"
nn.at<recurrent_layer>(0).seq_len(RNN_DATA_PER_EPOCH);
nn.set_netphase(net_phase::train);
nn.fit<cross_entropy_multiclass>(optimizer_, std::vector<vec_t>({input_}),std::vector<vec_t>({output_}), 1, 1, [](){},[](){});
//Leave the "leaning" mode. Keep on predicting
nn.set_netphase(net_phase::test);
nn.at<recurrent_layer>(0).seq_len(1);
input_.clear();
output_.clear();
}
//Predict the opponent's next mvoe
vec_t out = nn_.predict(vec_t(input.begin(), input.end()));
assert(out.size() == 3);
//Convert the prediction to xarray
xt::xarray<float> r = xt::zeros<float>({3});
for(size_t i=0;i<out.size();i++)
r[i] = out[i];
return r;
}
view raw runRNN.cpp hosted with ❤ by GitHub

The HTM Agent

The HTM agent is a lot straight forward. Recap from my previous project. All HTM layers receive a sparse binary tensor, or Sparse Distributed Representation(SDR) as input and generates a SDR representing what the algorithm learns. And TemporalMemory is a algorithm in HTM which learning to predict the next input based on input sequences observed in the past; exactly what I need.

So now, let’s create a TemporalMemory object and setup the hyper parameters. 3*ENCODE_WIDTH is how long the SDR the layer will receive. And TP_DEPTH is in broad terms, how may different sequence  that can potentially trigger an output.

TemporalMemory tm(3*ENCODE_WIDTH, TP_DEPTH);
tm->setMaxNewSynapseCount(64);
tm->setPermanenceIncrement(0.1);
tm->setPermanenceDecrement(0.045);
tm->setConnectedPermanence(0.4);
tm->setPredictedSegmentDecrement(0.3*2.0f*tm_->getPermanenceIncrement());

To train and make use of the TemporalMeory layer. Simply call the compute() function. It performs the learning and prediction automatically.

xt::xarray<float> compute(int last_oppo_move, bool learn = true)
{
auto out = tm.compute(encode(last_oppo_move), true); //true means enable learning
return categroize(3, ENCODE_WIDTH, out); //Convert from SDR to properablity distribution
}
view raw runHTM.cpp hosted with ❤ by GitHub

Playing the game

Now I have the two agents ready. It’s time for them to play games!
 Set the two algorithm to play against each other 200K times. Compile and  run… Val la!  Here comes the result…. There is way less draws than win/looses? I have tried multiple times with different parameters. It seems to be a consistent trend. Intresting

The final results

The code that lets the algorithm plays against each other is quite boring so I didn’t show it. If you are interested. The source code available here.

Conclusion

I don’t know what conclusion I can draw from this experiment… Theoretically both algorithm should be winning 33% of the time. However in fact both LSTM and HTM is winning around 38% of the time. I can’t find any explanation for this. Never the less, TemporalMemory is definitely a valid algorithm to learn and predict from sequences.

3 thoughts on “Building a bot playing Rock, Paper, Scissors using Hierarchical Temporal Memory being as good LSTM

Add yours

Leave a reply to Hyunsung Go Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

Up ↑