Microsoft SQL Server
====================

Connecting to Microsoft SQL Server databases from the command line
------------------------------------------------------------------

Microsoft provides the `sqlcmd
<https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup-tools#ubuntu>`_
tool for connecting the SQL Server databases from the command line. You need to
install it on your Faculty server first:

.. code-block:: bash

   $ curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
   $ curl https://packages.microsoft.com/config/ubuntu/18.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list
   $ sudo apt-get update
   $ sudo apt-get install mssql-tools unixodbc-dev

The last step above will ask you to accept some license agreements. The command
line tool gets installed to ``/opt/mssql-tools/bin/sqlcmd``.

To connect to a SQL Server database, you need to know:

 - its `server`: this is usually a string like ``customers.mydomain.com``;
 - its `port`: the default port for SQL Server databases is 1433;
 - the name of the database that you want to connect to. If this is a new
   server, the only database on it is likely to be named ``master``;
 - your username and password for the database server (note that this is
   different to your Faculty username). If you are unsure of these, you
   should ask your database administrator.

You can then connect to the database with:

.. code-block:: bash

   $ /opt/mssql-tools/bin/sqlcmd -S SERVER,PORT -U USERNAME -d DATABASE_NAME

You should then be prompted for a password. If the port is the default 1433,
you may omit the ``,PORT`` part of the server argument.

For instance, I can connect to database ``demos`` running on host
``customer-database.example.com`` with username ``admin``:

.. thumbnail:: images/mssqlserver_example.png

If you get stuck, run ``/opt/mssql-tools/bin/sqlcmd -? | less`` in a terminal
to view a list of all available options.

Connecting to Microsoft SQL Server databases from Python
--------------------------------------------------------

To connect to Microsoft SQL Server databases from Python, you can use the
Python module `pymssql <http://pymssql.org/>`_. You need to install it on your
Faculty server first:

.. code-block:: bash

   $ conda install pymssql --channel conda-forge

.. note:: We recommend installing pymssql from the 'conda-forge' conda channel
          rather from pip or the standard conda repositories to ensure you get
          all the required dependencies.

.. code-block:: python

    import pymssql

    connection = pymssql.connect(
        server='customers.mydomain.com',  # host on which the database is running
        database='database_name',  # name of the database to connect to
        user='username',  # username to connect with
        password='password'  # password associated with your username
    )

    cursor = connection.cursor()
    cursor.execute('SELECT * FROM customers')
    customers = list(cursor.fetchall())
    print('We have {} customers'.format(len(customers)))  # This is data science!

    connection.close()

.. note:: We close the connection to allow the database server to reclaim
          resources. This can be critical in a Jupyter notebook, since the
          kernel remains alive for a long time.

The pymssql connection object can also be used in a ``with`` statement to
ensure it gets closed automatically:

.. code-block:: python

    import pymssql

    with pymssql.connect(
        server='customers.mydomain.com', database='database_name',
        user='username', password='password'
    ) as connection:
        cursor = connection.cursor()
        cursor.execute('SELECT * FROM customers')
        customers = list(cursor.fetchall())
        print('We have {} customers'.format(len(customers)))

You can also use the ``read_sql()`` function in pandas to read the result of a
query directly into a DataFrame:

.. code-block:: python

    import pymssql
    import pandas

    with pymssql.connect(
        server='customers.mydomain.com', database='database_name',
        user='username', password='password'
    ) as connection:
        df = pandas.read_sql('SELECT * FROM customers', connection)

    print(df)

.. note:: We recommend avoiding pasting database passwords and other connection
          details in many notebooks in a project. Have a look at
          :ref:`package_connection_details` for a recommended strategy for
          managing database connection details.