# PostgreSQL¶

## Connecting to PostgreSQL databases from the command line¶

The psql client is the standard tool for connecting to PostgreSQL databases from the command line. It is pre-installed on all SherlockML servers.

To connect to a PostgreSQL database, you need to know:

• its hostname: this is usually a string like customers.mydomain.com;
• its port: the default port for PostgreSQL databases is 5432;
• the name of the database that you want to connect to. If this is a new server, the only database on it is likely to be named postgres;

You can then connect to the database with:

\$ psql --host HOSTNAME --port PORT --user USERNAME --dbname DATABASE_NAME


You should then be prompted for a password. If the port is the default 5432, you may omit the --port PORT argument.

For instance, I can connect to database github running on host github-scraper.cpllpj7texvd.eu-west-1.rds.amazonaws.com with username awsuser:

If you get stuck, run psql --help | less in a terminal to view a list of all available options.

## Connecting to PostgreSQL databases from Python¶

To connect to PostgreSQL databases from Python, you can use the Python module psycopg2, which comes pre-installed on SherlockML Jupyter and JupyterLab servers:

import psycopg2

connection = psycopg2.connect(
host='customers.mydomain.com',  # host on which the database is running
database='database_name',  # name of the database to connect to
)

cursor = connection.cursor()
cursor.execute('SELECT * FROM customers')
customers = list(cursor.fetchall())
print('We have {} customers'.format(len(customers)))  # This is data science!

connection.close()


Note

We close the connection to allow the database server to reclaim resources. This can be critical in a Jupyter notebook, since the kernel remains alive for a long time.

The psycopg2 object can also be used in a with statement to ensure it gets closed automatically:

import psycopg2

with psycopg2.connect(
host='customers.mydomain.com', database='database_name',
) as connection:
cursor = connection.cursor()
cursor.execute('SELECT * FROM customers')
customers = list(cursor.fetchall())
print('We have {} customers'.format(len(customers)))


You can also use the read_sql() function in pandas to read the result of a query directly into a DataFrame:

import psycopg2
import pandas

with psycopg2.connect(
host='customers.mydomain.com', database='database_name',