Getting started

Datasets can be accessed directly from Python. It allows you to copy files to and from the workspace.

At the start of a notebook import the Faculty datasets library:

>>> import faculty.datasets as datasets

List files

You can list all the files in your project’s datasets with:

>>> datasets.ls()
['/',
 '/input/',
 '/input/client-data.csv',
 '/input/extra/',
 '/input/extra/file1.txt',
 '/input/extra/file2.txt',
 '/output/']

To see a subset of files, just provide a prefix:

>>> datasets.ls('/input/extra')
['/input/extra/',
 '/input/extra/file1.txt',
 '/input/extra/file2.txt']

Get files

Get particular files from datasets into your workspace with the get function:

>>> datasets.get('/input/client-data.csv', 'client-data.csv')
>>> with open('client-data.csv') as f:
>>>     print(f.read())
name,email,age
"Jane Smith",jane.smith@example.com,32
"John White",john.white@example.com,28

You can also get whole directories:

>>> datasets.get('/input/extra', 'extra')
>>> import os
>>> os.listdir('extra')
['file1.txt', 'file2.txt']

Put files

We can go in reverse and put a file from the workspace into datasets with the put function:

>>> datasets.put('results.csv', '/output/results.csv')
>>> datasets.ls()
['/',
 '/input/',
 '/input/client-data.csv',
 '/input/extra/',
 '/input/extra/file1.txt',
 '/input/extra/file2.txt',
 '/output/',
 '/output/results.csv']

Again, this works with whole directories:

>>> datasets.put('figures', '/output/figures')
>>> datasets.ls()
['/',
 '/input/',
 '/input/client-data.csv',
 '/input/extra/',
 '/input/extra/file1.txt',
 '/input/extra/file2.txt',
 '/output/',
 '/output/figures/',
 '/output/figures/plot.png',
 '/output/figures/regression.png',
 '/output/results.csv']

Note

Copying and moving large files (> 1 GB) is currently not well supported. Instead of using the cp and mv commands, consider downloading the file first, and re-uploading it to a different location within datasets. Then, remove the original file if needed.