POSTS
Reinforcement Learning in practice. Enduro
Another part of our series called “Reinforcement Learning in practice”!
Today - again learning directly from video output - but with little bit different approach (details below). This time also different game - Enduro!
Full series
Overview
All the details explained HERE
Game looks like that:
This is racing game - our agent controls white car and needs to be faster than all other opponents. Objective is to pass certain number of cars each level - for example 200 cars at first one. What’s important here is that environment changes periodically - so we have day, night, summer, winter, etc. which affects controls of the vehicle.
Our implementation
As mentioned, today we will use little bit different approach. Still it will be CNN with Double DQN and Dueling DQN (as these work really great) but treating inputs different way. So far on each step we’ve been processing single image from video output - and now we will use 4 images stacked together. This way our agent should be able to understand environment better because it won’t use single static image but a sequence. So in theory while learning it should also take into considerations dynamic information like speed, direction of movement etc. Disadvantage is that it will be more challenging from resources point of view - because we will have simply 4 times more data to process.
Results
Rewards:
Info: X axis is number of episodes, Y axis is score (rewards received from environment per episode).
Again - as in last example we just waited to get satisfactory results - so this time we stopped after 200th episode which took us not that much - around 170 minutes (on single Nvidia GeForce GTX 1080 GPU). As we can see progress is pretty quick - already around 25th episode there is visible difference and then it is dynamically growing, with some nice huge peaks from time to time (which are great signs of learning because it’s not that easy to get this amount of points in this game)! Definitely we can identify an upward trend.
As mentioned before environment changes periodically (we have day, night, summer or winter). This is almost nothing for human but is huge for our agent - as it learns from video output (so in fact from pixels of images). So green grass or snow wouldn’t matter that much for you - but in image those are completely different numbers (color values of each pixel) - so it’s like totally new situation for our agent. And what we can observe - is that even tough it is doing great in “perfect” conditions - it is often struggling for example on snow. To become better in all conditions of course our learning process would need to be much longer.
But anyway - Enduro is one of the most interesting games where we can clearly see super human performance of our agent. It is really quick, it has amazing time of reactions, which definitely would be really hard for any person to match.
Again comparing to Deepmind’s paper - where human level is 309.6 - we got better results, which was our main goal. We reached it (>310) on 129th episode. At the end average reward (for last 100 episodes) was 467.65 points.
Below - agent’s performance captured on video:
Like it? Want to do similar things?
Have a look at our course! We teach everything from scratch!
Click below: