We are holding a competition on sample-efficient reinforcement learning using human priors. Standard methods require months to years of game time to attain human performance in complex games such as Go and StarCraft. In our competition, participants develop a system to obtain a diamond in Minecraft using only four days of training time. To facilitate solving this hard task with few samples, we provide a dataset of human demonstrations.
This competition uses a set of Gym environments based on Malmo. To improve the experience for competition participants, we have extended Malmo to support many new features, including synchronous ticking. The environments and dataset loader will be available as a pip package when the competition begins.
Sample snippets of the dataset.
The contest runs from June 5th to October 25th. All submissions are through AIcrowd. There you can find detailed rules and as well as the leaderboard. Additionally, Preferred Networks has released reference RL implementations available on github.
- Participants train their agents to play Minecraft. During the round, they submit trained models for evaluation to determine leaderboard ranks.
- At the end of the round, participants submit source code. The models at the top of the leaderboard are re-trained (from scratch) for four days to compute the final score used for ranking.
- Top 10 scores move on to Round 2.
- Top 10 participants from previous round receive Azure credits.
- Participants may submit code up to four times. Each submission is trained for four days to compute score. Final ranking is based on best submission for each participant.
- The top participants will present their work at a workshop at NeurIPS 2019.
The Task: Obtain Diamond in Minecraft
Minecraft is a 3D, first-person, open-world game centered around the gathering of resources and creation of structures and items. These structures and items have prerequisite tools and materials required for their creation. As a result, many items require the completion of a series of natural subtasks.
The procedurally generated world is composed of discrete blocks that allow modification. Over the course of gameplay, players change their surroundings by gathering resources and constructing structures.
In this competition, the goal is to obtain a diamond. The agent begins in a random starting location without any items, and receives rewards for obtaining items which are prerequisites for diamond.
The stages of obtaining a diamond.
Through our generous sponsor, Microsoft, we were able to provide 25 compute grants to participants competing in round 1. We will also provide additional compute grants for groups who move on to Round 2.
The competition organizers are committed to increasing the participation of groups traditionally underrepresented in reinforcement learning and, more generally, in machine learning. To that end, we will offer Inclusion@NeurIPS scholarships/travel grants for some number of Round 1 participants who are traditionally underrepresented at NeurIPS to attend the conference. We also plan to provide travel grants to enable all of the top participants from Round 2 to attend our NeurIPS workshop.
The applications for the Inclusion@NeurIPS travel grants can be found here.
Top-ranking teams in round 2 will receive rewards from our sponsors. Details will be announced as we finalize agreements.
The organizing team consists of:
- William H. Guss (Carnegie Mellon University)
- Mario Ynocente Castro (Preferred Networks)
- Cayden Codel (Carnegie Mellon University)
- Katja Hofmann (Microsoft Research)
- Brandon Houghton (Carnegie Mellon University)
- Noboru Kuno (Microsoft Research)
- Crissman Loomis (Preferred Networks)
- Keisuke Nakata (Preferred Networks)
- Stephanie Milani (University of Maryland, Baltimore County and Carnegie Mellon University)
- Sharada Mohanty (AIcrowd)
- Diego Perez Liebana (Queen Mary University of London)
- Ruslan Salakhutdinov (Carnegie Mellon University)
- Shinya Shiroshita (Preferred Networks)
- Nicholay Topin (Carnegie Mellon University)
- Avinash Ummadisingu (Preferred Networks)
- Manuela Veloso (Carnegie Mellon University)
- Phillip Wang (Carnegie Mellon University)
The advisory committee consists of:
- Chelsea Finn (Google Brain and UC Berkeley)
- Sergey Levine (UC Berkeley)
- Harm van Seijen (Microsoft Research)
- Oriol Vinyals (Google DeepMind)
If you have any questions, please feel free to contact us:
William H. Guss, Cayden Codel, Katja Hofmann, Brandon Houghton, Noboru Kuno, Stephanie Milani, Sharada Mohanty, Diego Perez Liebana, Ruslan Salakhutdinov, Nicholay Topin, Manuela Veloso, Phillip Wang
NeurIPS 2019 Competition Track