API

Opendata

ls([prefix]) List open datasets on SherlockML.
load(path[, copy]) Load an open dataset from SherlockML.
sherlockml.opendata.ls(prefix='')

List open datasets on SherlockML.

Parameters:
prefix : str, optional

List only open datasets matching this prefix

Returns:
list
sherlockml.opendata.load(path, copy=True)

Load an open dataset from SherlockML.

The dataset will be downloaded only if it is not in sync with SherlockML. The data is cached in memory for subsequent calls to load().

Parameters:
path : str

Path of file on SherlockML

copy : bool, optional

Return a copy of the data (default: True)

Dataset

sherlockml.opendata.dataset.dataset_factory(s3_key)

Generate a dataset, inferring the correct class from the file extension.

Parameters:
s3_key : str

The path of the dataset inside SherlockML opendata

Returns:
Dataset
class sherlockml.opendata.dataset.Dataset(s3_key)

An open dataset stored on SherlockML that is cached locally.

This class effectively implements two types of caching to reduce time spent waiting for dataset loads. First, it stores data from SherlockML to disk, avoiding repeat downloads. Second, it implements caching at runtime, meaning that each dataset needs only to be loaded once when being used many times.

Parameters:
s3_key : str

The path of the dataset on S3

Attributes:
local_path

Get the path for local storage of the dataset.

Methods

load([copy]) Load the dataset.