Getting started¶
Datasets can be accessed directly from Python. It allows you to copy files to and from the workspace.
At the start of a notebook import the Faculty datasets library:
>>> import faculty.datasets as datasets
List files¶
You can list all the files in your project’s datasets with:
>>> datasets.ls()
['/',
'/input/',
'/input/client-data.csv',
'/input/extra/',
'/input/extra/file1.txt',
'/input/extra/file2.txt',
'/output/']
To see a subset of files, just provide a prefix:
>>> datasets.ls('/input/extra')
['/input/extra/',
'/input/extra/file1.txt',
'/input/extra/file2.txt']
Get files¶
Get particular files from datasets into your workspace with the get function:
>>> datasets.get('/input/client-data.csv', 'client-data.csv')
>>> with open('client-data.csv') as f:
>>> print(f.read())
name,email,age
"Jane Smith",jane.smith@example.com,32
"John White",john.white@example.com,28
You can also get whole directories:
>>> datasets.get('/input/extra', 'extra')
>>> import os
>>> os.listdir('extra')
['file1.txt', 'file2.txt']
Put files¶
We can go in reverse and put a file from the workspace into datasets with the put function:
>>> datasets.put('results.csv', '/output/results.csv')
>>> datasets.ls()
['/',
'/input/',
'/input/client-data.csv',
'/input/extra/',
'/input/extra/file1.txt',
'/input/extra/file2.txt',
'/output/',
'/output/results.csv']
Again, this works with whole directories:
>>> datasets.put('figures', '/output/figures')
>>> datasets.ls()
['/',
'/input/',
'/input/client-data.csv',
'/input/extra/',
'/input/extra/file1.txt',
'/input/extra/file2.txt',
'/output/',
'/output/figures/',
'/output/figures/plot.png',
'/output/figures/regression.png',
'/output/results.csv']
Note
Copying and moving large files (> 1 GB) is currently not well supported. Instead of using the cp and mv commands, consider downloading the file first, and re-uploading it to a different location within datasets. Then, remove the original file if needed.