Thursday, July 12, 2018

What’s New in Deep Learning Research: Understanding DeepMind’s IMPALA

https://towardsdatascience.com/whats-new-in-deep-learning-research-understanding-deepmind-s-impala-4fbfa5d0ad0c

Deep reinforcement learning has rapidly become one of the hottest research areas in the deep learning ecosystem. The fascination with reinforcement learning is related to the fact that, from all the deep learning modalities, is the one that resemble the most how humans learn. In the last few years, no company in the world has done more to advance the stage of deep reinforcement learning than Alphabet’s subsidiary DeepMind.
Since the launch of its famous AlphaGo agent, DeepMind has been at the forefront of reinforcement learning research. A few days ago, they published a new research that attempts to tackle one of the most challenging aspects of reinforcement learning solutions: multi-tasking.
Since we are infants, multi-tasking becomes an intrinsic element of our cognition. The ability to performing and learning similar tasks concurrently is essential to the development of the human mind. From the neuroscientific standpoint, multi-tasking remains largely a mystery and that, not surprisingly, we have had a heck of hard time implementing artificial intelligence(AI) agents that can efficiently learn multiple domains without requiring a disproportional amount of resources. This challenge is even more evident in the case of deep reinforcement learning models that are based on trial and error exercises which can easily cross the boundaries of a single domain. Biologically speaking, you can argue that all learning is a multi-tasking exercise.
Let’s take a classic deep reinforcement learning scenario such as self-driving vehicles. In that scenarios, AI agents need to concurrently learn different aspects such as distance, memory or navigation while operating under rapidly changing parameters such as vision quality or speed. Most reinforcement learning methods today are focused on learning a single task and the models that track multi-task learning are too difficult to scale to be practical.
In their recent research the DeepMind team proposed a new architecture for deep reinforcement multi-task learning called Importance Weighted Actor-Learner Architecture (IMPALA). Inspired by another popular reinforcement learning architecture called A3C, IMPALA leverages a topology of different actors and learners that can collaborate to build knowledge across different domains. Traditionally, deep reinforcement learning models use an architecture based on a single learner combined with multiple actors. In that model, the Each actor generates trajectories and sends them via a queue to the learner. Before starting the next trajectory, actor retrieves the latest policy parameters from learner. IMPALA uses an architecture that collect experience which is passed to a central learner that computes gradients, resulting in a model that has completely independent actors and learners. This simple architecture enables the learner(s) to be accelerated using GPUs and actors to be easily distributed across many machines.
In addition to the multi-actor architecture model, the IMPALA research also introduces a new algorithm called V-Trace that focuses off-policy learning. The idea of V-Trace is to mitigate the lag between when actions are generated by the actors and when the learner estimates the gradient.
The DeepMind team tested IMPALA on different scenarios using its famous DMLab-30 training set and the results were impressive. IMPALA proved to achieve better performance compared to A3C variants in terms of data efficiency, stability and final performance. This might be the first deep reinforcement learning models that has been able to efficiently operate in multi-task environments.

No comments:

Post a Comment