Getting Started With Amazon Redshift
I recently worked on a project in which I was asked to migrate someones analytics database into a Redshift cluster. The backend of this website was using Django, so all of my development was done in Python.
A had a couple of goals before starting the project:
- Keep everything in Python
- Utilize Django’s ORM when possible
- Make migrations easy
The first step is to launch a Red Shift cluster on EC2 and your local ip address has to be added CIDR/IP security group of in the redshift dashboard. For testing I like to set it to 0.0.0.0\0
.
Once you have your cluster up and running, you can try to connect to it using the following command
psql "host=XXXXXXXX.cilmdlzscpk4.us-east-1.redshift.amazonaws.com user=XXXXXXX dbname=XXXXX port=5439"
If you can successfully connected, you’re all set, you need to setup your tables. I used the psycopg2
to setup these tables, but you could of course manually enter them into the REPL. If you want to take my approach first, assuming you are running on ubuntu, install the postgres
requirements
sudo apt-get install -y postgresql-client-9.3 libpq-dev
Now install psycopg2
using pip:
pip install psycopg2
Since we are going to utilize django's
orm we need to update settings.py
:
We also need to use a database router, to make sure that reads and writes go to redshift on a per app basis (this is something I did per the design of my application, but you might not need to do this):
And finally, add the router to settings.py
OK. everything is setup and ready to go! Adding tables is as simple as calling SQLs create table
command:
Now using psycopg2
: