Starting June 1st, we are holding a competition on sample-efficient reinforcement learning using human priors. Standard methods require months to years of game time to attain human performance in complex games such as Go and StarCraft. In our competition, participants develop a system to obtain a diamond in Minecraft using only four days of training time. To facilitate solving this hard task with few samples, we provide a dataset of human demonstrations.
This competition uses a set of Gym environments based on Malmo. To improve the experience for competition participants, we have extended Malmo to support many new features, including synchronous ticking. The environments and dataset loader will be available as a pip package when the competition begins.
Sample snippets of the dataset.
The contest will run from June 1st to October 25th. All submissions will be through CrowdAI. Detailed rules and leaderboard will be available on the CrowdAI competition webpage. Additionally, we will release reference RL implementations, courtesy of Preferred Networks.
- Participants train their agents to play Minecraft. During the round, they submit trained models for evaluation to determine leaderboard ranks.
- At the end of the round, participants submit source code. The models at the top of the leaderboard are re-trained (from scratch) for four days to compute the final score used for ranking.
- Top 10 move on to Round 2.
- Top 10 participants from previous round receive Azure credits.
- Participants may submit code up to four times. Each submission is trained for four days to compute score. Final ranking is based on best submission for each participant.
- The top participants will present their work at a workshop at NeurIPS 2019.
The Task: Obtain Diamond in Minecraft
Minecraft is a 3D, first-person, open-world game centered around the gathering of resources and creation of structures and items. These structures and items have prerequisite tools and materials required for their creation. As a result, many items require the completion of a series of natural subtasks.
The procedurally generated world is composed of discrete blocks that allow modification. Over the course of gameplay, players change their surroundings by gathering resources and constructing structures.
In this competition, the goal is to obtain a diamond. The agent begins in a random starting location without any items, and receives rewards for obtaining items which are prerequisites for diamond.
The stages of obtaining a diamond.
Through our generous sponsor, Microsoft, we will provide some compute grants for teams that self identify as lacking access to the necessary compute power to participate in the competition. We will also provide groups with the evaluation resources for their experiments in Round 2.
The competition organizers are committed to increasing the participation of groups traditionally underrepresented in reinforcement learning and, more generally, in machine learning. To that end, we will offer Inclusion@NeurIPS scholarships/travel grants for some number of Round 1 participants who are traditionally underrepresented at NeurIPS to attend the conference. We also plan to provide travel grants to enable all of the top participants from Round 2 to attend our NeurIPS workshop.
Top-ranking teams in round 2 will receive rewards from our sponsors. Details will be announced as we finalize agreements. Currently, Nvidia will be distributing three GPUs among the top teams.
The organizing team consists of:
- William H. Guss (Carnegie Mellon University)
- Cayden Codel (Carnegie Mellon University)
- Katja Hofmann (Microsoft Research)
- Brandon Houghton (Carnegie Mellon University)
- Noboru Kuno (Microsoft Research)
- Stephanie Milani (University of Maryland, Baltimore County and Carnegie Mellon University)
- Sharada Mohanty (AIcrowd)
- Diego Perez Liebana (Queen Mary University of London)
- Ruslan Salakhutdinov (Carnegie Mellon University)
- Nicholay Topin (Carnegie Mellon University)
- Manuela Veloso (Carnegie Mellon University)
- Phillip Wang (Carnegie Mellon University)
The advisory committee consists of:
- Chelsea Finn (Google Brain and UC Berkeley)
- Sergey Levine (UC Berkeley)
- Harm van Seijen (Microsoft Research)
- Oriol Vinyals (Google DeepMind)
If you have any questions, please feel free to contact us: