Deep Reinforcement Learning (deep RL) has had many significant successes, including superhuman performance at Dota and Go. However, there are several challenges ahead if we want to apply it in the real world, including sample efficiency, task specification, and exploration. We believe that addressing these challenges will require an open world environment along with human data.
Current methods in Deep RL are sample inefficient, especially as we move to more and more complex domains: OpenAI Five collected 900 years of experience per day and AlphaGoZero played 4.9 million games of Go.
Furthermore, there has been recent success in leveraging imitation learning to solve older benchmarks like Atari, as well as real-world problems such as robotic manipulation and self driving cars. More recently, AlphaStar was able to achieve Gold/Platinum MMR (~50% percentile human performance) using pretraining alone. We believe that leveraging human data will be an important piece of the puzzle as we tackle sample efficiency in more and more complex problems.
The Diamond competition is particularly focused on this challenge.
Difficulty of specifying tasks
It is hard to specify reward functions for many realistic tasks: for example, how would you define a reward function for washing the dishes, when all you have access to are pixel inputs? Recent research has proposed alternative task specifications, such as learning from demonstrations, comparisons, reward signals, advantage signals, etc.
To properly evaluate the use of such techniques for task specification, we need environments in which tasks are hard to specify. This happens in open world environments, where there are many realistic goals that an agent could pursue. Current benchmarks do not meet this standard: for example, you can get to a third of expert performance on the MuJoCo Hopper with a constant reward, and on Atari purely curious agents perform quite well, despite having no access to the true reward function.
The BASALT competition is particularly focused on this challenge.
Minecraft is a rich environment to do RL on: it is an open-world environment, has sparse rewards, and has many innate task hierarchies and subgoals. People can have many different goals within the game: perhaps you want to defeat the Ender Dragon while others try to stop you, or build a giant floating island chained to the ground, or produce more stuff than you will ever need. In addition, Minecraft has more than 90 million monthly active users, making it a good environment on which to collect a large-scale dataset.
To spur research on open-world environments with human data, we release MineRL: a suite of environments within Minecraft, alongside a large-scale dataset of human gameplay within those environments.
Besides the challenges discussed above, these environments also highlight a variety of other research challenges, including open-world multi-agent interactions, long-term planning, vision, control, navigation, and explicit and implicit subtask hierarchies. We also release a flexible framework to define new Minecraft tasks.
William H Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, Ruslan Salakhutdinov
Twenty-Eighth International Joint Conference on Artificial Intelligence
William H. Guss, Cayden Codel, Katja Hofmann, Brandon Houghton, Noboru Kuno, Stephanie Milani, Sharada Mohanty, Diego Perez Liebana, Ruslan Salakhutdinov, Nicholay Topin, Manuela Veloso, Phillip Wang
NeurIPS 2019 Competition Track