Microsoft SQL Server

Connecting to Microsoft SQL Server databases from the command line

Microsoft provides the sqlcmd tool for connecting the SQL Server databases from the command line. You need to install it on your SherlockML server first:

$ curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
$ curl https://packages.microsoft.com/config/ubuntu/18.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list
$ sudo apt-get update
$ sudo apt-get install mssql-tools unixodbc-dev

The last step above will ask you to accept some license agreements. The command line tool gets installed to /opt/mssql-tools/bin/sqlcmd.

To connect to a SQL Server database, you need to know:

  • its server: this is usually a string like customers.mydomain.com;
  • its port: the default port for SQL Server databases is 1433;
  • the name of the database that you want to connect to. If this is a new server, the only database on it is likely to be named master;
  • your username and password for the database server (note that this is different to your SherlockML username). If you are unsure of these, you should ask your database administrator.

You can then connect to the database with:

$ /opt/mssql-tools/bin/sqlcmd -S SERVER,PORT -U USERNAME -d DATABASE_NAME

You should then be prompted for a password. If the port is the default 1433, you may omit the ,PORT part of the server argument.

For instance, I can connect to database demos running on host customer-database.example.com with username admin:

../../_images/mssqlserver_example.png

If you get stuck, run /opt/mssql-tools/bin/sqlcmd -? | less in a terminal to view a list of all available options.

Connecting to Microsoft SQL Server databases from Python

To connect to Microsoft SQL Server databases from Python, you can use the Python module pymssql. You need to install it on your SherlockML server first:

$ conda install pymssql --channel conda-forge

Note

We recommend installing pymssql from the ‘conda-forge’ conda channel rather from pip or the standard conda repositories to ensure you get all the required dependencies.

import pymssql

connection = pymssql.connect(
    server='customers.mydomain.com',  # host on which the database is running
    database='database_name',  # name of the database to connect to
    user='username',  # username to connect with
    password='password'  # password associated with your username
)

cursor = connection.cursor()
cursor.execute('SELECT * FROM customers')
customers = list(cursor.fetchall())
print('We have {} customers'.format(len(customers)))  # This is data science!

connection.close()

Note

We close the connection to allow the database server to reclaim resources. This can be critical in a Jupyter notebook, since the kernel remains alive for a long time.

The pymssql connection object can also be used in a with statement to ensure it gets closed automatically:

import pymssql

with pymssql.connect(
    server='customers.mydomain.com', database='database_name',
    user='username', password='password'
) as connection:
    cursor = connection.cursor()
    cursor.execute('SELECT * FROM customers')
    customers = list(cursor.fetchall())
    print('We have {} customers'.format(len(customers)))

You can also use the read_sql() function in pandas to read the result of a query directly into a DataFrame:

import pymssql
import pandas

with pymssql.connect(
    server='customers.mydomain.com', database='database_name',
    user='username', password='password'
) as connection:
    df = pandas.read_sql('SELECT * FROM customers', connection)

print(df)

Note

We recommend avoiding pasting database passwords and other connection details in many notebooks in a project. Have a look at Factoring connection details into a package for a recommended strategy for managing database connection details.