Servers are computing resources you can access on demand.
Running computations on your servers is just like running computations on your local computer, except you get to recreate your computer at any time with whatever memory and computing power you need. You can have many servers or none. You can create them instantly and terminate them just as fast.
Servers provide the power you need to work. The brains of the operation is the workspace - remembering your files across all project servers - the servers are just the muscle. Terminating servers has no affect on your files in the workspace. Anything stored on the server outside the workspace will be terminated with the server.
Creating a server¶
You can create servers from the workspace. To launch a new server from the Workspace, use the NEW button and click Server.
When you have multiple servers running, you can select in the workspace which server to use when running a notebook.
When creating a server, you can name them for convenience and select the the number of CPUs and the amount of memory.
Choosing a server size¶
Five sizes of server are available. Sizes are specified in terms of processing units (CPUs) called ‘cores’ and memory (RAM) in gigabytes (GB).
For most data science projects the size of your data will determine how much memory you require. For tasks involving large amounts of computation which can be parallelised across cores you can create a server with more CPUs.
Small servers are useful for tasks that do not require a lot of compute power, such as editing files, exploring small datasets, and carrying out light computation.
Medium servers suit projects with slightly bigger data, for example where
you are reading up to 2 GB of data into
pandas DataFrames in notebooks.
As data size or parallelization requirements increase, you can move up the
server sizes for more memory and CPUs.
Large servers have twice as many cores as well as twice as much memory as medium servers. To benefit from a speed-up your computations will need to be parallelised: carrying out computation on multiple CPUs simultaneously. NumPy does some calculation in parallel automatically, but for most data processing you may want to use a library like Dask and for machine learning use the functionality in libraries like scikit-learn and TensorFlow for parallelising code.
Extra large servers give you serious power for running highly parallelized workloads or tools like Apache Spark.
Remember it only takes a few seconds to terminate a server and spin up a new one. For bespoke server configurations use the command line interface ‘sml’.