Getting started

Datasets can be accessed directly from Python. It allows you to copy files to and from the workspace.

At the start of a notebook import the SFS library:

>>> import sherlockml.filesystem as sfs

List files

You can list all the files in your project’s datasets with:

>>> sfs.ls()
['/input/',
 '/input/client-data.csv',
 '/input/extra/',
 '/input/extra/file1.txt',
 '/input/extra/file2.txt',
 '/output/']

To see a subset of files, just provide a prefix:

>>> sfs.ls('/input/extra')
['/input/extra/',
 '/input/extra/file1.txt',
 '/input/extra/file2.txt']

Get files

Get particular files from datasets into your workspace with the get function:

>>> sfs.get('/input/client-data.csv', 'client-data.csv')
>>> with open('client-data.csv') as f:
>>>     print(f.read())
name,email,age
"Jane Smith",jane.smith@example.com,32
"John White",john.white@example.com,28

You can also get whole directories:

>>> sfs.get('/input/extra', 'extra')
>>> import os
>>> os.listdir('extra')
['file1.txt', 'file2.txt']

Put files

We can go in reverse and put a file from the workspace into datasets with the put function:

>>> sfs.put('results.csv', '/output/results.csv')
>>> sfs.ls()
['/input/',
 '/input/client-data.csv',
 '/input/extra/',
 '/input/extra/file1.txt',
 '/input/extra/file2.txt',
 '/output/',
 '/output/results.csv']

Again, this works with whole directories:

>>> sfs.put('figures', '/output/figures')
>>> sfs.ls()
['/input/',
 '/input/client-data.csv',
 '/input/extra/',
 '/input/extra/file1.txt',
 '/input/extra/file2.txt',
 '/output/',
 '/output/figures/',
 '/output/figures/plot.png',
 '/output/figures/regression.png',
 '/output/results.csv']