Sign-up to participate [here]!

Help build the dataset by playing Minecraft [here]!

Starting June 1st, we are holding a competition on sample-efficient reinforcement learning using human priors. Standard methods require months to years of game time to attain human performance in complex games such as Go and StarCraft. In our competition, participants develop a system to obtain a diamond in Minecraft using only four days of training time. To facilitate solving this hard task with few samples, we provide a dataset of human demonstrations.

This competition uses a set of Gym environments based on Malmo. To improve the experience for competition participants, we have extended Malmo to support many new features, including synchronous ticking. The environments and dataset loader will be available as a pip package when the competition begins.

Sample snippets of the dataset.

Competition Overview

The contest will run from June 1st to October 25th. All submissions will be through CrowdAI. Detailed rules and leaderboard will be available on the CrowdAI competition webpage. Additionally, we will release reference RL implementations, courtesy of Preferred Networks.

Round 1

  1. Participants train their agents to play Minecraft. During the round, they submit trained models for evaluation to determine leaderboard ranks.
  2. At the end of the round, participants submit source code. The models at the top of the leaderboard are re-trained (from scratch) for four days to compute the final score used for ranking.
  3. Top 10 move on to Round 2.

Round 2

  1. Top 10 participants from previous round receive Azure credits.
  2. Participants may submit code up to four times. Each submission is trained for four days to compute score. Final ranking is based on best submission for each participant.
  3. The top participants will present their work at a workshop at NeurIPS 2019.

The Task: Obtain Diamond in Minecraft

Minecraft is a 3D, first-person, open-world game centered around the gathering of resources and creation of structures and items. These structures and items have prerequisite tools and materials required for their creation. As a result, many items require the completion of a series of natural subtasks.

The procedurally generated world is composed of discrete blocks that allow modification. Over the course of gameplay, players change their surroundings by gathering resources and constructing structures.

In this competition, the goal is to obtain a diamond. The agent begins in a random starting location without any items, and receives rewards for obtaining items which are prerequisites for diamond.

The stages of obtaining a diamond.
Wood Pickaxe
Mine Stone
and Create
Stone Pickaxe
Iron Ore
drawing drawing drawing
Smelt Iron
and Create
Iron Pickaxe
Search Mine
drawing drawing drawing


Through our generous sponsor, Microsoft, we will provide some compute grants for teams that self identify as lacking access to the necessary compute power to participate in the competition. We will also provide groups with the evaluation resources for their experiments in Round 2.

The competition organizers are committed to increasing the participation of groups traditionally underrepresented in reinforcement learning and, more generally, in machine learning. To that end, we will offer Inclusion@NeurIPS scholarships/travel grants for some number of Round 1 participants who are traditionally underrepresented at NeurIPS to attend the conference. We also plan to provide travel grants to enable all of the top participants from Round 2 to attend our NeurIPS workshop.

The applications for the compute grants and Inclusion@NeurIPS travel grants can be found here and here, respectively.


Top-ranking teams in round 2 will receive rewards from our sponsors. Details will be announced as we finalize agreements. Currently, Nvidia will be distributing three GPUs among the top teams.


The organizing team consists of:

The advisory committee consists of:


If you have any questions, please feel free to contact us: