4 lessons learned deploying Python Django on EC2

Recently I spent 4 solid days working on deploying a Python Django application on an EC2 instance. The process was way longer than I expected it to be, but this helped me learn a lot about Python Django deployment that I think will be valuable to share.

Deployment is important but annoying

When you're busy designing and building a functional web application, it's easy to forget that you have to deploy it onto a server at some point. Using a local development environment is great. You have a local Sqlite3 or MySQL where the data doesnt matter, you can keep tweaking the models as you see fit, run migrations like a maniac, hardcode arbitrary values in settings.py, but all this changes when you try to get the application ready for deployment.

There are two predominant ways to deploy code

There are predominantly two popular ways to deploy a web application (note that I am broadly simplifying a very compilcated process):

Git-based deployment (or broadly, source control management based deployment)
SSH-based deployment

Git-based deployment

For Git-based deployment, the development team has a dedicated server that monitors a branch (usually master) for changes and upon a successfully change deploys the code to a staging server. The process is supported by a set of scripts (Bash, Python, Ruby, etc.) to get the app deployed across a set of servers.

Platform-as-a-Service vendors like Heroku and Openshift allow for a slightly modified version of Git-based deployment where they provide a remote git repo you can push to which then triggers a deployment. The main drawback to this is that the app goes down for a short period of time as the servers checkout the latest codebase and run bootstrap scripts.

Git-based deployment is great if you have a relatively mature web application with a sizable team of engineers where a select few are given the rights to push to master. In addition, you should have robust test coverage so as to be quite confident that when you push things will work for the most part. In addition, this is a great step toward setting up a continuous integration deployment environment.

However, git-based deployment is overkill for a web app that is relatively new and only spans a few servers. The iteration cycle to fixing issues is quite high, you have to keep making commits to make small changes which can get annoying.

There are several interesting tools that solve this problem, the most notable ones in my experience are TravisCI and JenkinsCI. You can find a more comprehensive list here

SSH-based deployment

SSH-based deployment is where the source code is packaged into an archive and then copied over to a server using SSH. Then some shell commands are run to launch some bootstrap scripts to update configuration, other setup, and restart the app and web servers if needed. If you have multiple servers this is done for each one.

Two notable frameworks that come to mind are Capistrano and Fabric. Capistrano is Ruby based while Fabric is Python based.

SSH-based deployment is quick and simple. It's a great way to get started for a web app that has limited testing and is less mature. Since you can SSH into a server, you can easily tweak configuration, install packages, edit code, and mess around with processes quite easily. This allows you to iterate quickly to get up and running and makes debugging deployment a lot easier. However, as you scale you probably don't want everyone in the team to have SSH access to your servers and want to centralize everything.

What I learned after wasting 4 days

I spent 4 full days reading dozens of tutorials and how-to guides to get this app up and running. I didn't care if Gunicorn was faster than uwsgi or if supervisor provided more features than Upstart, I just needed something working and somewhat scalable. In order to deploy my Django application I ultimately used:

Fabric for deployment
Ubuntu-based machine (Ubuntu 14.04.4)
EC2 small instance, the micro instances are generally terribly slow
nginx for the web server
uwsgi for the app server (also tried gunicorn)
MySQL for the database
init.d and start-stop-deamon to manage services (tried supervisor but it has drawbacks)

1. Centralize your server setup scripts using fabric

Fabric is a powerful tool, we centralize all our setup scripts into a set of Fabric tasks so that way provisioning a new Ubuntu server is super easy. Our scripts install the appropriate packages, create the necessary folders, and setup the server to be used to deploy.

2. Don't use the default app server

A lot of tutorials I found by Google-ing suggested that I simply run the same command as I do during development:

python manage.py runserver

However, this is not a good idea. The default app server that Django provides is lightweight but is development oriented. In production it's a good idea to use a WSGI based app server like uwsgi or gunicorn which are production-ready and battle tested.

3. Use a virtualenv for deployment

Once our code is packaged and deployed onto the server we install it by running python setup.py install onto a virtualenv that we specifically set up for running the application. Then we point our app server (uwsgi) to use that virtualenv to run the application and point it to the wsgi.py file in the latest release. A lot of tutorials were quite unclear about this.

4. Supervisor is overkill initially

Supervisor is a great process monitoring system which I thought might be a good idea to use. It has a configuration file you can setup and then use the supervisorctl command to monitor and manage the services you want.

However, supervisor only let you specify one command to run so that it can monitor that. It doesn't provide a way to specify pre-run and post-run scripts or chain multiple shell commands into one line. This is especially key for MySQL on EC2 since the PID and LOG file are stored on the /var/run directory which is erased upon an EC2 restart. You need to run a shell script to create the appropriate folders and files before MySQL is run.

For a simple web app it's perfectly alright to get started using init.d and start-stop-daemon to monitor processes. We simply store our init.d scripts for each service in our repo and there are plenty of open source starter scripts available online.

So there you go, if you're dealing with problems while deploying a Django app on EC2 hopefully my experience will help you save hours of time and focus on getting your app live as quickly as possible.