The story begins when I saw a post in Numenta’s forum. Making an AI (or, ML algorithm) play rock, paper, scissors against a human player and learn the human player’s pattern. And by that, beating dominating the human. Than I thought to my self. Why not to write 2 AI. One in HTM, one using LSTM. And have them compete?
The idea is to have 2 AI players. One implemented in HTM and the other in RNN (or LSTM/GRU). Let them learn and predict their opponent’s next move then act accordingly. Sounds fun and straightforward enough.
The RNN Agent
I’ll be using the broad definition of RNN – being a NN layer that has a hidden state coming from previous timesteps in this post unless specified otherwise.
Agents in this game is expected to do 2 things. First, learn the opponent’s pattern. 2. Predict the opponent’s next move based of the opponent’s past move history. To keep things simple. I’ll be using my favoret DL framework – tiny-dnn. And the RNN agent will be a simple 2 layer network receiving an one hot encoded vector as input .
So, let’s define the network.
size_t hidden_size = 30; | |
size_t seq_len = 3; //The length of traning srquence | |
network<sequential> nn; | |
nn << recurrent(lstm(3, hidden_size), seq_len); | |
nn << leaky_relu() << fc(hidden_size) << softmax(); |
Then the tricky part…. The RNN is required to predict and learn at the same time. Yet due to tiny-dnn is a static graph library. The network can only be trained every 3 steps (the seq_len
parameter) . So for each step, we save the input to a std::vector
. Predict the opponent’s next move with the network with the current input. Than train the network every 3 steps.
xt::xarray<float> compute(xt::xarray<float> input) | |
{ | |
assert(input.size() == 3); | |
//save data for traning | |
if(last_input_.size() != 0) { | |
for(auto v : last_input_) | |
input_.push_back(v); | |
for(auto v : input) | |
output_.push_back(v); | |
} | |
last_input_ = vec_t(input.begin(), input.end()); | |
//Train once all needed data collected | |
if(input_.size() == RNN_DATA_PER_EPOCH) { | |
assert(input_.size() == output_.size()); | |
//Set the netwotk into a "traning more" | |
nn.at<recurrent_layer>(0).seq_len(RNN_DATA_PER_EPOCH); | |
nn.set_netphase(net_phase::train); | |
nn.fit<cross_entropy_multiclass>(optimizer_, std::vector<vec_t>({input_}),std::vector<vec_t>({output_}), 1, 1, [](){},[](){}); | |
//Leave the "leaning" mode. Keep on predicting | |
nn.set_netphase(net_phase::test); | |
nn.at<recurrent_layer>(0).seq_len(1); | |
input_.clear(); | |
output_.clear(); | |
} | |
//Predict the opponent's next mvoe | |
vec_t out = nn_.predict(vec_t(input.begin(), input.end())); | |
assert(out.size() == 3); | |
//Convert the prediction to xarray | |
xt::xarray<float> r = xt::zeros<float>({3}); | |
for(size_t i=0;i<out.size();i++) | |
r[i] = out[i]; | |
return r; | |
} |
The HTM Agent
The HTM agent is a lot straight forward. Recap from my previous project. All HTM layers receive a sparse binary tensor, or Sparse Distributed Representation(SDR) as input and generates a SDR representing what the algorithm learns. And TemporalMemory is a algorithm in HTM which learning to predict the next input based on input sequences observed in the past; exactly what I need.
So now, let’s create a TemporalMemory object and setup the hyper parameters. 3*ENCODE_WIDTH
is how long the SDR the layer will receive. And TP_DEPTH
is in broad terms, how may different sequence that can potentially trigger an output.
TemporalMemory tm(3*ENCODE_WIDTH, TP_DEPTH); | |
tm->setMaxNewSynapseCount(64); | |
tm->setPermanenceIncrement(0.1); | |
tm->setPermanenceDecrement(0.045); | |
tm->setConnectedPermanence(0.4); | |
tm->setPredictedSegmentDecrement(0.3*2.0f*tm_->getPermanenceIncrement()); |
To train and make use of the TemporalMeory layer. Simply call the compute() function. It performs the learning and prediction automatically.
xt::xarray<float> compute(int last_oppo_move, bool learn = true) | |
{ | |
auto out = tm.compute(encode(last_oppo_move), true); //true means enable learning | |
return categroize(3, ENCODE_WIDTH, out); //Convert from SDR to properablity distribution | |
} |
Playing the game
Now I have the two agents ready. It’s time for them to play games!
Set the two algorithm to play against each other 200K times. Compile and run… Val la! Here comes the result…. There is way less draws than win/looses? I have tried multiple times with different parameters. It seems to be a consistent trend. Intresting
The code that lets the algorithm plays against each other is quite boring so I didn’t show it. If you are interested. The source code available here.
Conclusion
I don’t know what conclusion I can draw from this experiment… Theoretically both algorithm should be winning 33% of the time. However in fact both LSTM and HTM is winning around 38% of the time. I can’t find any explanation for this. Never the less, TemporalMemory is definitely a valid algorithm to learn and predict from sequences.
Very interesting!
LikeLiked by 1 person
Thank you 😀
LikeLike
Many people say that search engine optimization is really a waste vitality.
Of course that is certainly why you’re going to ought to do this
task. They try to match it to a different picture or number. http://www.cdzthygs.com/comment/html/?8621.html
LikeLike