Hello World: Your First Agent

With the minerl package installed on your system you can now make your first agent in Minecraft!

To get started, let’s first import the necessary packages

import gym
import minerl

Creating an environment

Now we can choose any one of the many environments included in the minerl package. To learn more about the environments checkout the environment documentation.

For this tutorial we’ll choose the MineRLNavigateDense-v0 environment. In this task, the agent is challenged with using a first-person perspective of a random Minecraft map and navigating to a target.

To create the environment, simply invoke gym.make

env = gym.make('MineRLNavigateDense-v0')

Caution

Currently minerl only supports environment rendering in headed environments (servers with monitors attached).

In order to run minerl environments without a head use a software renderer such as xvfb:

xvfb-run python3 <your_script.py>

Note

The first time you run this command to complete, it will take a while as it is recompiling Minecraft with the MineRL simulator mod!

If you’re worried and want to make sure something is happening behind the scenes install a logger before you create the envrionment.

import logging
logging.basicConfig(level=logging.DEBUG)

env = gym.make('MineRLNavigateDense-v0')

Taking actions

As a warm up let’s create a random agent. 🧠

Now we can reset this environment to its first position and get our first observation from the agent by resetting the environment.

obs = env.reset()

The obs variable will be a dictionary containing the following observations returned by the environment. In the case of the MineRLNavigate-v0 environment, three observations are returned: pov, an RGB image of the agent’s first person perspective; compassAngle, a float giving the angle of the agent to its (approximate) target; and inventory, a dictionary containing the amount of 'dirt' blocks in the agent’s inventory (this is useful for climbing steep inclines).

{
    'pov': array([[[ 63,  63,  68],
        [ 63,  63,  68],
        [ 63,  63,  68],
        ...,
        [ 92,  92, 100],
        [ 92,  92, 100],
        [ 92,  92, 100]],,

        ...,


        [[ 95, 118, 176],
        [ 95, 119, 177],
        [ 96, 119, 178],
        ...,
        [ 93, 116, 172],
        [ 93, 115, 171],
        [ 92, 115, 170]]], dtype=uint8),
    'compassAngle': -63.48639,
    'inventory': {'dirt': 0}
}

Note

To see the exact format of observations returned from and the exact action format expected by env.step for any environment refer to the environment reference documentation!

Now let’s take actions through the environment until time runs out or the agent dies. To do this, we will use the normal OpenAI Gym env.step method.

done = False

while not done:
    action = env.action_space.sample()
    obs, reward, done, _ = env.step(action)

After running this code you should see your agent move sporadically until the done flag is set to true. To confirm that our agent is at least qualitatively acting randomly, on the right is a plot of the compass angle over the course of the experiment.

../_images/compass_angle.png

No-op actions and a better policy

Now let’s make a hard-coded agent that actually runs towards the target. 🧠🧠🧠

To do this at every step of the environment we will take the noop action with a few modifications; in particular, we will only move forward, jump, attack, and changw the agent’s direction to minimize the angle between the agent’s movement direction and it’s target, compassAngle.

import minerl
import gym
env = gym.make('MineRLNavigateDense-v0')


obs  = env.reset()
done = False
net_reward = 0

while not done:
    action = env.action_space.noop()

    action['camera'] = [0, 0.03*obs["compassAngle"]]
    action['back'] = 0
    action['forward'] = 1
    action['jump'] = 1
    action['attack'] = 1

    obs, reward, done, info = env.step(
        action)

    net_reward += reward
    print("Total reward: ", net_reward)

After running this agent, you should notice marekedly less sporadic behaviour. Plotting both the compassAngle and the net reward over the episode confirm that this policy performs better than our random policy.

../_images/compass_angle_better.png ../_images/net_reward.png

Congratulations! You’ve just made your first agent using the minerl framework!