One of potential optimizations for our RL algorithms is stacking multiple images together and feed our network not with one, single frame but with those multiple stacked frames.
As already mentioned in one of our previous posts LINK… “this way our agent should be able to understand environment better, because it won’t use single static image but a sequence. So in theory while learning it should also take into considerations dynamic information like speed, direction of movement etc.”
Of course this is nothing new - this approach was already included in famous Deepmind’s paper.
So let’s try to figure out how to benefit from this idea in our implementations.
One of easiest way is to use wrappers from OpenAI’s baselines - here goes LINK.
In the file there is even method called wrap_deepmind which according to authors:
Configures environment for DeepMind-style Atari
and, amongst others, does exactly what we want here - as it has option to stack multiple images together.
By default disabled but we can easily enable it using:
env = wrap_deepmind(env,frame_stack=True)
It will stack 4 images together and final result looks like that (in example of Pong game):
And this we can feed to our Neural Network.
As mentioned multiple times our main and favourite tool for Deep Learning and Reinforcement Learning is Pytorch.
And this complicates the situation little bit. By default images from OpenAI Gym come in shape HWC (height x width x channel) but Pytorch expects format CHW (and it has good, performance related reasons).
So to meet requirements we need to do little “trick”. In same atari_wrappers.py file we added our own method:
def __init__(self, env):
current_shape = self.observation_space.shape
self.observation_space = gym.spaces.Box(low=0.0, high=1.0, shape=(current_shape[-1], current_shape, current_shape))
def observation(self, observation):
return np.swapaxes(observation, 2, 0)
What it does is very simple - it adjusts observation space from Gym environment and simply swaps axis for each single frame - so shape HWC becomes CHW and our Pytorch is very happy :)
We also adjusted wrap_deepmind method a bit so our ImageToPyTorch is optional (so it won’t ruin implementations using different frameworks, like Tensorflow). So now to use this new approach we just need to define our environment like that:
env = wrap_deepmind(env,frame_stack=True, pytorch_img=True)
Our adjusted atari_wrappers.py file is available HERE
Also worth noting is that when using those wrappers, as output from Gym we will have so called LazyFrames - which represent each image as a memory optimized array of uint8 numbers. If you’re interested in deeper details - have a look at method FrameStack.
Anyway if we want to let Pytorch work with these - one of the ways is:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
frame = torch.from_numpy(frame)
frame = frame.to(device, dtype=torch.float32)
So first we transform the data to Pytorch tensor, then we define if it needs to be processed by CPU or GPU and by the way we cast it to float32 - which at least in our implementations is mostly used when calculating data in Neural Networks.
Following above recommendations you will be able to stack 4 images from OpenAI Gym and feed it to your Neural Network - exactly the same way DeepMind did in their DQN paper - and also use this approach in Pytorch. This way our agent will understand the environment better and in many cases should be more efficient in learning process!
Next step for you - experiment with it!
And if you still need more details have a look at our Udemy’s course!