minerl.data

The minerl.data package provides a unified interface for sampling data from the MineRL-v0 Dataset. Data is accessed by making a dataset from one of the minerl environments and iterating over it using one of the iterators provided by the minerl.data.DataPipeline

The following is a description of the various methods included within the package as well as some basic usage examples. To see more detailed descriptions and tutorials on how to use the data API, please take a look at our numerous getting started manuals.

MineRLv0

class minerl.data.DataPipeline(data_directory: <module 'posixpath' from '/Users/brandon/opt/miniconda3/envs/rapid/lib/python3.7/posixpath.py'>, environment: str, num_workers: int, worker_batch_size: int, min_size_to_dequeue: int, random_seed=42)

Bases: object

Creates a data pipeline object used to itterate through the MineRL-v0 dataset

property action_space

action space of current MineRL environment

Type

Returns

batch_iter(batch_size: int, seq_len: int, num_epochs: int = - 1, preload_buffer_size: int = 2, seed: int = None, include_metadata: bool = False)

Returns batches of sequences length SEQ_LEN of the data of size BATCH_SIZE. The iterator produces batches sequentially. If an element of a batch reaches the end of its

Parameters
  • batch_size (int) – The batch size.

  • seq_len (int) – The size of sequences to produce.

  • num_epochs (int, optional) – The number of epochs to iterate over the data. Defaults to -1.

  • preload_buffer_size (int, optional) – Increase to IMPROVE PERFORMANCE. The data iterator uses a queue to prevent blocking, the queue size is the number of trajectories to load into the buffer. Adjust based on memory constraints. Defaults to 32.

  • seed (int, optional) – [int]. NOT IMPLEMENTED Defaults to None.

  • include_metadata (bool, optional) – Include metadata on the source trajectory. Defaults to False.

Returns

A generator that yields (sarsd) batches

Return type

Generator

get_trajectory_names()

Gets all the trajectory names

Returns

[description]

Return type

A list of experiment names

load_data(stream_name: str, skip_interval=0, include_metadata=False)

Iterates over an individual trajectory named stream_name.

Parameters
  • stream_name (str) – The stream name desired to be iterated through.

  • skip_interval (int, optional) – How many sices should be skipped.. Defaults to 0.

  • include_metadata (bool, optional) – Whether or not meta data about the loaded trajectory should be included.. Defaults to False.

Yields

A tuple of (state, player_action, reward_from_action, next_state, is_next_state_terminal). These are tuples are yielded in order of the episode.

property observation_space

action space of current MineRL environment

Type

Returns

static read_frame(cap)
sarsd_iter(num_epochs=- 1, max_sequence_len=32, queue_size=None, seed=None, include_metadata=False)

Returns a generator for iterating through (state, action, reward, next_state, is_terminal) tuples in the dataset. Loads num_workers files at once as defined in minerl.data.make() and return up to max_sequence_len consecutive samples wrapped in a dict observation space

Parameters
  • num_epochs (int, optional) – number of epochs to iterate over or -1 to loop forever. Defaults to -1

  • max_sequence_len (int, optional) – maximum number of consecutive samples - may be less. Defaults to 32

  • seed (int, optional) – seed for random directory walk - note, specifying seed as well as a finite num_epochs will cause the ordering of examples to be the same after every call to seq_iter

  • queue_size (int, optional) – maximum number of elements to buffer at a time, each worker may hold an additional item while waiting to enqueue. Defaults to 16*self.number_of_workers or 2* self.number_of_workers if max_sequence_len == -1

  • include_metadata (bool, optional) – adds an additional member to the tuple containing metadata about the stream the data was loaded from. Defaults to False

Yields

A tuple of (state, player_action, reward_from_action, next_state, is_next_state_terminal, (metadata)). Each element is in the format of the environment action/state/reward space and contains as many samples are requested.

seq_iter(num_epochs=- 1, max_sequence_len=32, queue_size=None, seed=None, include_metadata=False)

DEPRECATED METHOD FOR SAMPLING DATA FROM THE MINERL DATASET.

This function is now DataPipeline.batch_iter()

property spec
minerl.data.download(directory=None, resolution='low', texture_pack=0, update_environment_variables=True, disable_cache=False, experiment=None, minimal=False)

Downloads MineRLv0 to specified directory. If directory is None, attempts to download to $MINERL_DATA_ROOT. Raises ValueError if both are undefined.

Parameters
  • directory (os.path) – destination root for downloading MineRLv0 datasets

  • resolution (str, optional) – one of [ ‘low’, ‘high’ ] corresponding to video resolutions of [ 64x64, 256,128 ] respectively (note: high resolution is not currently supported). Defaults to ‘low’.

  • texture_pack (int, optional) – 0: default Minecraft texture pack, 1: flat semi-realistic texture pack. Defaults to 0.

  • update_environment_variables (bool, optional) – enables / disables exporting of MINERL_DATA_ROOT environment variable (note: for some os this is only for the current shell) Defaults to True.

  • disable_cache (bool, optional) – downloads temporary files to local directory. Defaults to False

  • experiment (str, optional) – specify the desired experiment to download. Will only download data for this experiment. Note there is no hash verification for individual experiments

  • minimal (bool, optional) – download a minimal version of the dataset

minerl.data.make(environment=None, data_dir=None, num_workers=4, worker_batch_size=32, minimum_size_to_dequeue=32, force_download=False)

Initalizes the data loader with the chosen environment

Parameters
  • environment (string) – desired MineRL environment

  • data_dir (string, optional) – specify alternative dataset location. Defaults to None.

  • num_workers (int, optional) – number of files to load at once. Defaults to 4.

  • force_download (bool, optional) – specifies whether or not the data should be downloaded if missing. Defaults to False.

Returns

initalized data pipeline

Return type

DataPipeline