Another part of our series called “Reinforcement Learning in practice”! AI playing games.
Today different text-based game - this time little bit more complicated - Taxi.
- Frozen Lake
- Lunar Lander
All the details explained HERE but shortly…
Game looks like that:
There are 4 main locations (R,G,B,Y).
When episodes starts we are dropped in random place, and our task is to pick up passenger from one of main locations - the one marked with blue color and drop him off in one marked with magenta.
Important fact is that those vertical lines “|” are walls so our driver needs to go around these.
For each step there is -1 point.
And quoting OpenAI :
” You receive +20 points for a successful dropoff, and lose 1 point for every timestep it takes. There is also a 10 point penalty for illegal pick-up and drop-off actions. “
As with Frozen Lake we also focused on tabular methods - but this time using Q Learning. Again we used Pytorch.
Info: X axis is number of episodes, Y axis is score (rewards received from environment per episode).
Info: X axis is number of episodes, Y axis is number of steps executed by agent
At the beginning our driver was really struggling and didn’t know (of course) what to do, so consistently it was using 200 steps (which is limit for the game) and getting terrible reward from environment - up to almost -700 points (as mentioned for illegal drop off or pick up actions there is -10 points and for each step is -1).
Relatively quickly we can see nice progress and already around 200th episode agent was doing pretty good job.
Finally - for last part of training - average reward was over 8 points (using around 12 steps). This is really good result for this environment!
Below - agent’s performance captured on video:
Like it? Want to do similar things?
Have a look at our course! We teach everything from scratch!