General Information

The minerl package includes several environments as follows. This page describes each of the included environments, provides usage samples, and describes the exact action and observation space provided by each environment!

Caution

In the MineRL Competition, many environments are provided for training, however competition agents will only be evaluated in MineRLObtainDiamond-v0 which has sparse rewards. See MineRLObtainDiamond-v0.

Note

All environments offer a default no-op action via env.action_space.no_op() and a random action via env.action_space.sample()

Environment Handlers

Minecraft is an extremely complex environment which provides players with visual, auditory, and informational observation of many complex data types. Furthermore, players interact with Minecraft using more than just embodied actions: players can craft, build, destroy, smelt, enchant, manage their inventory, and even communicate with other players via a text chat.

To provide a unified interface with which agents can obtain and perform similar observations and actions as players, we have provided first-class for support for this multi-modality in the environment: the observation and action spaces of environments are gym.spaces.Dict spaces. These observation and action dictionaries are comprised of individual fields we call handlers.

Note

In the documentation of every environment we provide a listing of the exact gym.space of the observations returned by and actions expected by the environment’s step function. We are slowly building documentation for these handlers, and you can click those highlighted with blue for more information!

MineRLTreechop-v0

In treechop, the agent must collect 64 minercaft:log. This replicates a common scenario in Minecraft, as logs are necessary to craft a large amount of items in the game, and are a key resource in Minecraft.

The agent begins in a forest biome (near many trees) with an iron axe for cutting trees. The agent is given +1 reward for obtaining each unit of wood, and the episode terminates once the agent obtains 64 units.

Observation Space

Dict({
    "pov": "Box(64, 64, 3)"
})

Action Space

Dict({
    "attack": "Discrete(2)",
    "back": "Discrete(2)",
    "camera": "Box(2,)",
    "forward": "Discrete(2)",
    "jump": "Discrete(2)",
    "left": "Discrete(2)",
    "right": "Discrete(2)",
    "sneak": "Discrete(2)",
    "sprint": "Discrete(2)"
})

Usage

import gym
import minerl

# Run a random agent through the environment
env = gym.make("MineRLTreechop-v0") # A MineRLTreechop-v0 env

obs = env.reset()
done = False

while not done:
    # Take a no-op through the environment.
    obs, rew, done, _ = env.step(env.action_space.noop())
    # Do something

######################################

# Sample some data from the dataset!
data = minerl.data.make("MineRLTreechop-v0")

# Iterate through a single epoch using sequences of at most 32 steps
for obs, rew, done, act in data.seq_iter(num_epochs=1, batch_size=32):
    # Do something

MineRLNavigateDense-v0

In this task, the agent must move to a goal location denoted by a diamond block. This represents a basic primitive used in many tasks throughout Minecraft. In addition to standard observations, the agent has access to a “compass” observation, which points near the goal location, 64 meters from the start location. The goal has a small random horizontal offset from the compass location and may be slightly below surface level. On the goal location is a unique block, so the agent must find the final goal by searching based on local visual features.

The agent is given a sparse reward (+100 upon reaching the goal, at which point the episode terminates). This variant of the environment is dense reward-shaped where the agent is given a reward every tick for how much closer (or negative reward for farther) the agent gets to the target.

In this environment, the agent spawns on a random survival map.

Observation Space

Dict({
    "compassAngle": "Box()",
    "inventory": {
            "dirt": "Box()"
    },
    "pov": "Box(64, 64, 3)"
})

Action Space

Dict({
    "attack": "Discrete(2)",
    "back": "Discrete(2)",
    "camera": "Box(2,)",
    "forward": "Discrete(2)",
    "jump": "Discrete(2)",
    "left": "Discrete(2)",
    "place": "Enum(none,dirt)",
    "right": "Discrete(2)",
    "sneak": "Discrete(2)",
    "sprint": "Discrete(2)"
})

Usage

import gym
import minerl

# Run a random agent through the environment
env = gym.make("MineRLNavigateDense-v0") # A MineRLNavigateDense-v0 env

obs = env.reset()
done = False

while not done:
    # Take a no-op through the environment.
    obs, rew, done, _ = env.step(env.action_space.noop())
    # Do something

######################################

# Sample some data from the dataset!
data = minerl.data.make("MineRLNavigateDense-v0")

# Iterate through a single epoch using sequences of at most 32 steps
for obs, rew, done, act in data.seq_iter(num_epochs=1, batch_size=32):
    # Do something

MineRLNavigate-v0

In this task, the agent must move to a goal location denoted by a diamond block. This represents a basic primitive used in many tasks throughout Minecraft. In addition to standard observations, the agent has access to a “compass” observation, which points near the goal location, 64 meters from the start location. The goal has a small random horizontal offset from the compass location and may be slightly below surface level. On the goal location is a unique block, so the agent must find the final goal by searching based on local visual features.

The agent is given a sparse reward (+100 upon reaching the goal, at which point the episode terminates). This variant of the environment is sparse.

In this environment, the agent spawns on a random survival map.

Observation Space

Dict({
    "compassAngle": "Box()",
    "inventory": {
            "dirt": "Box()"
    },
    "pov": "Box(64, 64, 3)"
})

Action Space

Dict({
    "attack": "Discrete(2)",
    "back": "Discrete(2)",
    "camera": "Box(2,)",
    "forward": "Discrete(2)",
    "jump": "Discrete(2)",
    "left": "Discrete(2)",
    "place": "Enum(none,dirt)",
    "right": "Discrete(2)",
    "sneak": "Discrete(2)",
    "sprint": "Discrete(2)"
})

Usage

import gym
import minerl

# Run a random agent through the environment
env = gym.make("MineRLNavigate-v0") # A MineRLNavigate-v0 env

obs = env.reset()
done = False

while not done:
    # Take a no-op through the environment.
    obs, rew, done, _ = env.step(env.action_space.noop())
    # Do something

######################################

# Sample some data from the dataset!
data = minerl.data.make("MineRLNavigate-v0")

# Iterate through a single epoch using sequences of at most 32 steps
for obs, rew, done, act in data.seq_iter(num_epochs=1, batch_size=32):
    # Do something

MineRLNavigateExtremeDense-v0

In this task, the agent must move to a goal location denoted by a diamond block. This represents a basic primitive used in many tasks throughout Minecraft. In addition to standard observations, the agent has access to a “compass” observation, which points near the goal location, 64 meters from the start location. The goal has a small random horizontal offset from the compass location and may be slightly below surface level. On the goal location is a unique block, so the agent must find the final goal by searching based on local visual features.

The agent is given a sparse reward (+100 upon reaching the goal, at which point the episode terminates). This variant of the environment is dense reward-shaped where the agent is given a reward every tick for how much closer (or negative reward for farther) the agent gets to the target.

In this environment, the agent spawns in an extreme hills biome.

Observation Space

Dict({
    "compassAngle": "Box()",
    "inventory": {
            "dirt": "Box()"
    },
    "pov": "Box(64, 64, 3)"
})

Action Space

Dict({
    "attack": "Discrete(2)",
    "back": "Discrete(2)",
    "camera": "Box(2,)",
    "forward": "Discrete(2)",
    "jump": "Discrete(2)",
    "left": "Discrete(2)",
    "place": "Enum(none,dirt)",
    "right": "Discrete(2)",
    "sneak": "Discrete(2)",
    "sprint": "Discrete(2)"
})

Usage

import gym
import minerl

# Run a random agent through the environment
env = gym.make("MineRLNavigateExtremeDense-v0") # A MineRLNavigateExtremeDense-v0 env

obs = env.reset()
done = False

while not done:
    # Take a no-op through the environment.
    obs, rew, done, _ = env.step(env.action_space.noop())
    # Do something

######################################

# Sample some data from the dataset!
data = minerl.data.make("MineRLNavigateExtremeDense-v0")

# Iterate through a single epoch using sequences of at most 32 steps
for obs, rew, done, act in data.seq_iter(num_epochs=1, batch_size=32):
    # Do something

MineRLNavigateExtreme-v0

In this task, the agent must move to a goal location denoted by a diamond block. This represents a basic primitive used in many tasks throughout Minecraft. In addition to standard observations, the agent has access to a “compass” observation, which points near the goal location, 64 meters from the start location. The goal has a small random horizontal offset from the compass location and may be slightly below surface level. On the goal location is a unique block, so the agent must find the final goal by searching based on local visual features.

The agent is given a sparse reward (+100 upon reaching the goal, at which point the episode terminates). This variant of the environment is sparse.

In this environment, the agent spawns in an extreme hills biome.

Observation Space

Dict({
    "compassAngle": "Box()",
    "inventory": {
            "dirt": "Box()"
    },
    "pov": "Box(64, 64, 3)"
})

Action Space

Dict({
    "attack": "Discrete(2)",
    "back": "Discrete(2)",
    "camera": "Box(2,)",
    "forward": "Discrete(2)",
    "jump": "Discrete(2)",
    "left": "Discrete(2)",
    "place": "Enum(none,dirt)",
    "right": "Discrete(2)",
    "sneak": "Discrete(2)",
    "sprint": "Discrete(2)"
})

Usage

import gym
import minerl

# Run a random agent through the environment
env = gym.make("MineRLNavigateExtreme-v0") # A MineRLNavigateExtreme-v0 env

obs = env.reset()
done = False

while not done:
    # Take a no-op through the environment.
    obs, rew, done, _ = env.step(env.action_space.noop())
    # Do something

######################################

# Sample some data from the dataset!
data = minerl.data.make("MineRLNavigateExtreme-v0")

# Iterate through a single epoch using sequences of at most 32 steps
for obs, rew, done, act in data.seq_iter(num_epochs=1, batch_size=32):
    # Do something

MineRLObtainIronPickaxe-v0

In this environment the agent is required to obtain an iron pickaxe. The agent begins in a random starting location, on a random survival map, without any items, matching the normal starting conditions for human players in Minecraft. The agent is given access to a selected view of its inventory and GUI free crafting, smelting, and inventory management actions.

During an episode the agent is rewarded only once per item the first time it obtains that item in the requisite item hierarchy for obtaining an iron pickaxe. The reward for each item is given here:

<Item amount="1" reward="1" type="log" />
<Item amount="1" reward="2" type="planks" />
<Item amount="1" reward="4" type="stick" />
<Item amount="1" reward="4" type="crafting_table" />
<Item amount="1" reward="8" type="wooden_pickaxe" />
<Item amount="1" reward="16" type="cobblestone" />
<Item amount="1" reward="32" type="furnace" />
<Item amount="1" reward="32" type="stone_pickaxe" />
<Item amount="1" reward="64" type="iron_ore" />
<Item amount="1" reward="128" type="iron_ingot" />
<Item amount="1" reward="256" type="iron_pickaxe" />

Observation Space

Dict({
    "equipped_items": {
            "mainhand": {
                    "damage": "Box()",
                    "maxDamage": "Box()",
                    "type": "Enum(none,air,wooden_axe,wooden_pickaxe,stone_axe,stone_pickaxe,iron_axe,iron_pickaxe,other)"
            }
    },
    "inventory": {
            "coal": "Box()",
            "cobblestone": "Box()",
            "crafting_table": "Box()",
            "dirt": "Box()",
            "furnace": "Box()",
            "iron_axe": "Box()",
            "iron_ingot": "Box()",
            "iron_ore": "Box()",
            "iron_pickaxe": "Box()",
            "log": "Box()",
            "planks": "Box()",
            "stick": "Box()",
            "stone": "Box()",
            "stone_axe": "Box()",
            "stone_pickaxe": "Box()",
            "torch": "Box()",
            "wooden_axe": "Box()",
            "wooden_pickaxe": "Box()"
    },
    "pov": "Box(64, 64, 3)"
})

Action Space

Dict({
    "attack": "Discrete(2)",
    "back": "Discrete(2)",
    "camera": "Box(2,)",
    "craft": "Enum(none,torch,stick,planks,crafting_table)",
    "equip": "Enum(none,air,wooden_axe,wooden_pickaxe,stone_axe,stone_pickaxe,iron_axe,iron_pickaxe)",
    "forward": "Discrete(2)",
    "jump": "Discrete(2)",
    "left": "Discrete(2)",
    "nearbyCraft": "Enum(none,wooden_axe,wooden_pickaxe,stone_axe,stone_pickaxe,iron_axe,iron_pickaxe,furnace)",
    "nearbySmelt": "Enum(none,iron_ingot,coal)",
    "place": "Enum(none,dirt,stone,cobblestone,crafting_table,furnace,torch)",
    "right": "Discrete(2)",
    "sneak": "Discrete(2)",
    "sprint": "Discrete(2)"
})

Usage

import gym
import minerl

# Run a random agent through the environment
env = gym.make("MineRLObtainIronPickaxe-v0") # A MineRLObtainIronPickaxe-v0 env

obs = env.reset()
done = False

while not done:
    # Take a no-op through the environment.
    obs, rew, done, _ = env.step(env.action_space.noop())
    # Do something

######################################

# Sample some data from the dataset!
data = minerl.data.make("MineRLObtainIronPickaxe-v0")

# Iterate through a single epoch using sequences of at most 32 steps
for obs, rew, done, act in data.seq_iter(num_epochs=1, batch_size=32):
    # Do something

MineRLObtainIronPickaxeDense-v0

In this environment the agent is required to obtain an iron pickaxe. The agent begins in a random starting location, on a random survival map, without any items, matching the normal starting conditions for human players in Minecraft. The agent is given access to a selected view of its inventory and GUI free crafting, smelting, and inventory management actions.

During an episode the agent is rewarded **every time ** it obtains an item in the requisite item hierarchy for obtaining an iron pickaxe. The rewards for each item are given here:

<Item amount="1" reward="1" type="log" />
<Item amount="1" reward="2" type="planks" />
<Item amount="1" reward="4" type="stick" />
<Item amount="1" reward="4" type="crafting_table" />
<Item amount="1" reward="8" type="wooden_pickaxe" />
<Item amount="1" reward="16" type="cobblestone" />
<Item amount="1" reward="32" type="furnace" />
<Item amount="1" reward="32" type="stone_pickaxe" />
<Item amount="1" reward="64" type="iron_ore" />
<Item amount="1" reward="128" type="iron_ingot" />
<Item amount="1" reward="256" type="iron_pickaxe" />

Observation Space

Dict({
    "equipped_items": {
            "mainhand": {
                    "damage": "Box()",
                    "maxDamage": "Box()",
                    "type": "Enum(none,air,wooden_axe,wooden_pickaxe,stone_axe,stone_pickaxe,iron_axe,iron_pickaxe,other)"
            }
    },
    "inventory": {
            "coal": "Box()",
            "cobblestone": "Box()",
            "crafting_table": "Box()",
            "dirt": "Box()",
            "furnace": "Box()",
            "iron_axe": "Box()",
            "iron_ingot": "Box()",
            "iron_ore": "Box()",
            "iron_pickaxe": "Box()",
            "log": "Box()",
            "planks": "Box()",
            "stick": "Box()",
            "stone": "Box()",
            "stone_axe": "Box()",
            "stone_pickaxe": "Box()",
            "torch": "Box()",
            "wooden_axe": "Box()",
            "wooden_pickaxe": "Box()"
    },
    "pov": "Box(64, 64, 3)"
})

Action Space

Dict({
    "attack": "Discrete(2)",
    "back": "Discrete(2)",
    "camera": "Box(2,)",
    "craft": "Enum(none,torch,stick,planks,crafting_table)",
    "equip": "Enum(none,air,wooden_axe,wooden_pickaxe,stone_axe,stone_pickaxe,iron_axe,iron_pickaxe)",
    "forward": "Discrete(2)",
    "jump": "Discrete(2)",
    "left": "Discrete(2)",
    "nearbyCraft": "Enum(none,wooden_axe,wooden_pickaxe,stone_axe,stone_pickaxe,iron_axe,iron_pickaxe,furnace)",
    "nearbySmelt": "Enum(none,iron_ingot,coal)",
    "place": "Enum(none,dirt,stone,cobblestone,crafting_table,furnace,torch)",
    "right": "Discrete(2)",
    "sneak": "Discrete(2)",
    "sprint": "Discrete(2)"
})

Usage

import gym
import minerl

# Run a random agent through the environment
env = gym.make("MineRLObtainIronPickaxeDense-v0") # A MineRLObtainIronPickaxeDense-v0 env

obs = env.reset()
done = False

while not done:
    # Take a no-op through the environment.
    obs, rew, done, _ = env.step(env.action_space.noop())
    # Do something

######################################

# Sample some data from the dataset!
data = minerl.data.make("MineRLObtainIronPickaxeDense-v0")

# Iterate through a single epoch using sequences of at most 32 steps
for obs, rew, done, act in data.seq_iter(num_epochs=1, batch_size=32):
    # Do something

MineRLObtainDiamond-v0

Caution

This is the evaluation environment of the MineRL Competition! Specifically, you are allowed to train your agents on any environment (including MineRLObtainDiamondDense-v0) however, your agent will only be evaluated on this environment..

In this environment the agent is required to obtain a diamond in 18000 steps. The agent begins in a random starting location, on a random survival map, without any items, matching the normal starting conditions for human players in Minecraft. The agent is given access to a selected view of its inventory and GUI free crafting, smelting, and inventory management actions.

During an episode the agent is rewarded only once per item the first time it obtains that item in the requisite item hierarchy for obtaining an iron pickaxe. The reward for each item is given here:

<Item reward="1" type="log" />
<Item reward="2" type="planks" />
<Item reward="4" type="stick" />
<Item reward="4" type="crafting_table" />
<Item reward="8" type="wooden_pickaxe" />
<Item reward="16" type="cobblestone" />
<Item reward="32" type="furnace" />
<Item reward="32" type="stone_pickaxe" />
<Item reward="64" type="iron_ore" />
<Item reward="128" type="iron_ingot" />
<Item reward="256" type="iron_pickaxe" />
<Item reward="1024" type="diamond" />

Observation Space

Dict({
    "equipped_items": {
            "mainhand": {
                    "damage": "Box()",
                    "maxDamage": "Box()",
                    "type": "Enum(none,air,wooden_axe,wooden_pickaxe,stone_axe,stone_pickaxe,iron_axe,iron_pickaxe,other)"
            }
    },
    "inventory": {
            "coal": "Box()",
            "cobblestone": "Box()",
            "crafting_table": "Box()",
            "dirt": "Box()",
            "furnace": "Box()",
            "iron_axe": "Box()",
            "iron_ingot": "Box()",
            "iron_ore": "Box()",
            "iron_pickaxe": "Box()",
            "log": "Box()",
            "planks": "Box()",
            "stick": "Box()",
            "stone": "Box()",
            "stone_axe": "Box()",
            "stone_pickaxe": "Box()",
            "torch": "Box()",
            "wooden_axe": "Box()",
            "wooden_pickaxe": "Box()"
    },
    "pov": "Box(64, 64, 3)"
})

Action Space

Dict({
    "attack": "Discrete(2)",
    "back": "Discrete(2)",
    "camera": "Box(2,)",
    "craft": "Enum(none,torch,stick,planks,crafting_table)",
    "equip": "Enum(none,air,wooden_axe,wooden_pickaxe,stone_axe,stone_pickaxe,iron_axe,iron_pickaxe)",
    "forward": "Discrete(2)",
    "jump": "Discrete(2)",
    "left": "Discrete(2)",
    "nearbyCraft": "Enum(none,wooden_axe,wooden_pickaxe,stone_axe,stone_pickaxe,iron_axe,iron_pickaxe,furnace)",
    "nearbySmelt": "Enum(none,iron_ingot,coal)",
    "place": "Enum(none,dirt,stone,cobblestone,crafting_table,furnace,torch)",
    "right": "Discrete(2)",
    "sneak": "Discrete(2)",
    "sprint": "Discrete(2)"
})

Usage

import gym
import minerl

# Run a random agent through the environment
env = gym.make("MineRLObtainDiamond-v0") # A MineRLObtainDiamond-v0 env

obs = env.reset()
done = False

while not done:
    # Take a no-op through the environment.
    obs, rew, done, _ = env.step(env.action_space.noop())
    # Do something

######################################

# Sample some data from the dataset!
data = minerl.data.make("MineRLObtainDiamond-v0")

# Iterate through a single epoch using sequences of at most 32 steps
for obs, rew, done, act in data.seq_iter(num_epochs=1, batch_size=32):
    # Do something

MineRLObtainDiamondDense-v0

In this environment the agent is required to obtain a diamond. The agent begins in a random starting location on a random survival map without any items, matching the normal starting conditions for human players in Minecraft. The agent is given access to a selected summary of its inventory and GUI free crafting, smelting, and inventory management actions.

During an episode the agent is rewarded every time it obtains an item in the requisite item hierarchy to obtaining a diamond. The rewards for each item are given here:

<Item reward="1" type="log" />
<Item reward="2" type="planks" />
<Item reward="4" type="stick" />
<Item reward="4" type="crafting_table" />
<Item reward="8" type="wooden_pickaxe" />
<Item reward="16" type="cobblestone" />
<Item reward="32" type="furnace" />
<Item reward="32" type="stone_pickaxe" />
<Item reward="64" type="iron_ore" />
<Item reward="128" type="iron_ingot" />
<Item reward="256" type="iron_pickaxe" />
<Item reward="1024" type="diamond" />

Observation Space

Dict({
    "equipped_items": {
            "mainhand": {
                    "damage": "Box()",
                    "maxDamage": "Box()",
                    "type": "Enum(none,air,wooden_axe,wooden_pickaxe,stone_axe,stone_pickaxe,iron_axe,iron_pickaxe,other)"
            }
    },
    "inventory": {
            "coal": "Box()",
            "cobblestone": "Box()",
            "crafting_table": "Box()",
            "dirt": "Box()",
            "furnace": "Box()",
            "iron_axe": "Box()",
            "iron_ingot": "Box()",
            "iron_ore": "Box()",
            "iron_pickaxe": "Box()",
            "log": "Box()",
            "planks": "Box()",
            "stick": "Box()",
            "stone": "Box()",
            "stone_axe": "Box()",
            "stone_pickaxe": "Box()",
            "torch": "Box()",
            "wooden_axe": "Box()",
            "wooden_pickaxe": "Box()"
    },
    "pov": "Box(64, 64, 3)"
})

Action Space

Dict({
    "attack": "Discrete(2)",
    "back": "Discrete(2)",
    "camera": "Box(2,)",
    "craft": "Enum(none,torch,stick,planks,crafting_table)",
    "equip": "Enum(none,air,wooden_axe,wooden_pickaxe,stone_axe,stone_pickaxe,iron_axe,iron_pickaxe)",
    "forward": "Discrete(2)",
    "jump": "Discrete(2)",
    "left": "Discrete(2)",
    "nearbyCraft": "Enum(none,wooden_axe,wooden_pickaxe,stone_axe,stone_pickaxe,iron_axe,iron_pickaxe,furnace)",
    "nearbySmelt": "Enum(none,iron_ingot,coal)",
    "place": "Enum(none,dirt,stone,cobblestone,crafting_table,furnace,torch)",
    "right": "Discrete(2)",
    "sneak": "Discrete(2)",
    "sprint": "Discrete(2)"
})

Usage

import gym
import minerl

# Run a random agent through the environment
env = gym.make("MineRLObtainDiamondDense-v0") # A MineRLObtainDiamondDense-v0 env

obs = env.reset()
done = False

while not done:
    # Take a no-op through the environment.
    obs, rew, done, _ = env.step(env.action_space.noop())
    # Do something

######################################

# Sample some data from the dataset!
data = minerl.data.make("MineRLObtainDiamondDense-v0")

# Iterate through a single epoch using sequences of at most 32 steps
for obs, rew, done, act in data.seq_iter(num_epochs=1, batch_size=32):
    # Do something