Next part of our series called “Reinforcement Learning in practice”!
We cannot do it without one of our favourite Atari games - Breakout! Of course still learning directly from video output.
- Frozen Lake
- Lunar Lander
All the details explained HERE
Game looks like that:
Of course rules are super simple. Our agent controls the pad (by moving it left and right) and we need to destroy bricks on the top, not letting the ball to touch the bottom. It’s one of classics - so for sure you know :)
Just as before with Enduro - we use CNN with our powerful combo: Double DQN + Dueling and again processing 4 images stacked together. Of course using Pytorch.
Important to note - for this game our agent definitely needs time… we set ourselves maximum time of 24h of training.
Our agent in available time managed to go through little bit more then 50k episodes - around 11.5M frames (on single Nvidia GeForce GTX 1080 GPU). It needed quite a solid warmup - so we could notice nice progress after around 5000th episodes (45 minutes):
Later its performance, although not super stable, was really fine and definitely upward trend could be noticed.
Info: X axis is number of episodes, Y axis is score (rewards received from environment per episode).
Before training we were mainly wondering about 3 things:
- If it can reach human level performance
According to Deepmind’s paper human level is 31.8. Our graph is not perfect because even though we can see that it consistently reaches 300 points - average value during training was much lower - over those 24h it reached up to 42.3 points (for all episodes). For last 100 episodes it was reaching close to 100 points in average. So better than humans - but still more time would be needed to see that very clearly.
- If it can figure out strategy using “magic” tunnel in the wall
By tunnel of course we mean perfect strategy - so drilling the tunnel on a side of the screen and using it to score many points just waiting for a ball…
And well - it was success - eventually it was our agent’s favourite game plan. And worth noting is that it used it multiple times even throughout single episode.
- If it can accomplish both within 24h of training
Breakout is not easy game for AI agents because upper part of the screen changes all the time - disappearing bricks, ball in different positions etc. And these changes might be significantly different between multiple episodes. So as we are learning only from pixels and the values changes all the time - our agent needs a lot of time to generalize for all scenarios.
Finally it was success as well and we met our goals within 24h.
Below - agent’s performance captured on video:
Like it? Want to do similar things?
Have a look at our course! We teach everything from scratch!