This year, we are adding a new competition to the MineRL family: BASALT, a competition on solving human-judged tasks, with $11,000 in prizes. The tasks in this competition do not have a pre-defined reward function: the goal is to produce trajectories that are judged by real humans to be effective at solving a given task.
We realize this is somewhat uncharted territory for the ML community, and that it will require a different set of norms and training procedures - perhaps integrating demonstrations with sources of live human ranking, rating, or comparison to guide agents in the right direction. Our hope is that this competition can provide an impetus for the research community to build these new procedures, which we expect will become increasingly relevant as we want artificially intelligent systems to integrate into more areas of our lives.
Like the Diamond competition, BASALT provides a set of Gym environments paired with human demonstrations, since methods based on imitation are an important building block for solving hard-to-specify tasks.
The agent should search for a cave, and terminate the episode when it is inside one.
After spawning in a mountainous area, the agent should build a beautiful waterfall and then reposition itself to take a scenic picture of the same waterfall.
After spawning in a village, the agent should build an animal pen containing two of the same kind of animal next to one of the houses in a village.
Using items in its starting inventory, the agent should build a new house in the style of the village, in an appropriate location (e.g. next to the path through the village), without harming the village in the process.
All submissions are through AIcrowd. There you can find detailed rules as well as the leaderboard.
Submission: Submit Trained Agents
- Participants train agents to solve BASALT tasks. Participants submit both the training code as well as already-trained models for evaluation.
Evaluation 1: Leaderboard
- During the competition, leaderboard ranks will be determined by generating videos from already-trained models on new environment seeds, and having visitors to the competition site compare which of the videos they see is better.
- 50 teams, chosen according to a combination of overall and per-task score, move on to the next evaluation round.
Evaluation 2: Final Scores
- Submissions will be shown to Mechanical Turk workers or contractors via the same leaderboard mechanism. The difference from the previous round is that these workers will spend more time understanding the tasks and providing good comparisons, and will not consist of other participants.
- The top 10 teams will advance to Round 2.
- Competition organizers inspect the training code for remaining participants to ensure it follows competition rules.
- Models will be retrained on our hardware, with paid contractors providing human feedback if required. If the resulting models are significantly worse than the initial agents submitted by the team, that team is disqualified.
- Winners are chosen from the remaining teams, according to their scores from the second evaluation round.
Our baseline is a simple behavioral cloning algorithm trained for a couple of hours. We hope to see participants improve upon it significantly!
Thanks to the generosity of our sponsors, there will be $11,000 worth of cash prizes:
- First place: $5,000
- Second place: $3,000
- Third place: $2,000
- Most human-like: $500
- Creativity of research: $500
In addition, the top three teams will be invited to coauthor the competition report.
Note that as we expect to be unable to evaluate all submissions, prizes may be restricted to entries that reach the second evaluation phase, or the validation phase, at the organizers’ discretion. Prize winners are expected to present their solutions at NeurIPS.
We also have an additional $1,000 worth of prizes for participants who provide support for the competition:
- Community support: $500 (may be split across participants at the organizers’ discretion)
- Lottery for leaderboard ratings (above and beyond those used to “pay” for submissions): 5 prizes each worth $100
The organizing team consists of:
- Rohin Shah (UC Berkeley)
- Cody Wild (UC Berkeley)
- Steven H. Wang (UC Berkeley)
- Neel Alex (UC Berkeley)
- Brandon Houghton (OpenAI)
- William Guss (OpenAI)
- Sharada Mohanty (AIcrowd)
- Anssi Kanervisto (University of Eastern Finland)
- Stephanie Milani (Carnegie Mellon University)
- Nicholay Topin (Carnegie Mellon University)
- Pieter Abbeel (UC Berkeley)
- Stuart Russell (UC Berkeley)
- Anca Dragan (UC Berkeley)
- Sergio Guadarrama (Google Brain)
- Katja Hofmann (Microsoft Research)
- Andrew Critch (UC Berkeley)
- Open Philanthropy
If you have any questions, please feel free to contact us at rohinmshah AT berkeley DOT edu.
Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan
NeurIPS 2021 Competition Track