The (near) Perfect Dockerfile for Django Applications

The (near) Perfect Dockerfile for Django Applications

Prerequisites- This article assumes an intermediate level understanding of Docker and Django based application development.

Docker has revolutionized software development and has proven to be the nucleus of new-age development practices like CI-CD, distributed development, and collaboration.

Still, there isn’t any popular consensus on what are good docker development principles. Dockerfiles written for Java or Scala don’t directly translate to Python(we will explore this).

This article discusses an opinionated, production-ready Docker setup for Django applications which can be used in docker-compose file(also given below) or with Kubernetes. Our requirement further extends for containers to be scaled up and down without any side effects.

If you need the code without going into the reasoning, a sample Django repo with Docker setup is available for download on Github, here.

So without any further ado, below is our Dockerfile.

# Dockerfile for Django Applications

# Section 1- Base Image
FROM python:3.8-slim

# Section 2- Python Interpreter Flags
ENV PYTHONUNBUFFERED 1  
ENV PYTHONDONTWRITEBYTECODE 1

# Section 3- Compiler and OS libraries
RUN apt-get update \  
  && apt-get install -y --no-install-recommends build-essential libpq-dev \  
  && rm -rf /var/lib/apt/lists/*

# Section 4- Project libraries and User Creation
COPY requirements.txt /tmp/requirements.txt

RUN pip install --no-cache-dir -r /tmp/requirements.txt \  
    && rm -rf /tmp/requirements.txt \  
    && useradd -U app_user \  
    && install -d -m 0755 -o app_user -g app_user /app/static

# Section 5- Code and User Setup
WORKDIR /app

USER app_user:app_user

COPY --chown=app_user:app_user . .

RUN chmod +x docker/*.sh

# Section 6- Docker Run Checks and Configurations 
ENTRYPOINT [ "docker/entrypoint.sh" ]

CMD [ "docker/start.sh", "server" ]

Before going into each section of the above Dockerfile and entrypoint.sh and start.sh mentioned in it, let’s discuss specifications of test Django application for which we are writing Docker setup:

  1. Celery is used for background tasks, with Redis as the celery backend.
  2. Celery beat is used for cron jobs, to schedule periodic tasks.
  3. Flower is used for background tasks monitoring.
  4. We are using PostgreSQL as our Database.

Let’s explore each section of our Dockerfile:

Section 1- Base Image

FROM python:3.8-slim

We have selected python:3.8-slim as the base image. While choosing a base image key consideration is its size, as a bigger base image results in a bigger docker image size. Developers prefer alpine flavor due to its small size and for languages such as Java or Scala, in most cases, it is the right way to go. Alpine is a minimal Docker image based on Alpine Linux.

But for Python applications, many requisite libraries are not supported by alpine flavor out of the box. It means you would end up downloading dependencies on alpine flavor which will result in bigger image size. This also means, greater image build time and application incompatibility. The slim flavor sit’s between alpine and full version and hits the sweet spot in terms of size and compatibility.

At present, the Python community has started recognizing this issue, and you will find many articles like this which discuss this issue in detail.

Section 2- Python Interpreter Flags

ENV PYTHONUNBUFFERED 1  
ENV PYTHONDONTWRITEBYTECODE 1

We have set two flags PYTHONUNBUFFERED and PYTHONDONTWRITEBYTECODE to non-empty values to modify the behavior of the Python interpreter.

When set to a non-empty value, PYTHONUNBUFFERED will send python output straight to the terminal(standard output) without being buffered. This helps in two ways. Firstly, this allows us to get logs in real-time. Secondly, in case of container crash, it ensures that we receive output and hence, the reason for failure.

We are also setting PYTHONDONTWRITEBYTECODE to a non-empty value. This ensures that the Python interpreter doesn’t generate .pyc files which apart from being useless in our use-case, can also lead to few hard-to-find bugs.

Section 3- Compiler and OS libraries

apt-get update

Commands in this section install compilers, tools, and OS-level libraries. For e.g. apt-get update , as you may already know, update the list of available packages. It doesn’t update packages themselves, just fetches their latest versions.

apt-get install -y --no-install-recommends build-essential libpq-dev

The build-essential contains a collection of meta-packages that are necessary to compile software. This includes, but is not limited to, GNU debugger, g++/GNU compiler collection, and a few other tools and libraries. The complete list of build-essential packages can be found here. As per official documentation libpq-dev contains,

Header files and static library for compiling C programs to link with the libpq library in order to communicate with a PostgreSQL database backend.

Since libpq-dev contains libraries concerning the PostgreSQL database, feel free to drop this if you are using some other database and install the requisite for that database.

The flag --no-install-recommends skips the installation of other recommended packages. This is done to reduce docker image size. Please note that dependent packages mandatory for our packages are still getting installed.

rm -rf /var/lib/apt/lists/*

Cleaning /var/lib/apt/lists/* can easily reduce your docker image size by ~5%-25%. The apt-get update command updates versions of the list of packages that are not required in our Dockerfile after installing build-essential and libpq-dev . Hence, in this step, we clean out all the files added.

Section 4- Project libraries and User Creation

In this section, we install the project libraries mentioned in requirements.txt and create a user who will be a non-root user for security purposes.

COPY requirements.txt /tmp/requirements.txt

If you notice, instead of copying the whole project, which we do eventually in Section 5, we are only copying requirements.txt. Then we are installing all the libraries mentioned in it. This is done so because Docker works on the principle of layers. If there is any change in a layer, all the subsequence layers will be re-processed. Hence, copying only requirements.txt ensures that installation is reused across docker builds. This layer is dropped if there is a change in the requirements.txt file itself. Had we copied the entire project of Section 5 here, each new commit or change in code would lead to invalidating of these layers and re-installation of libraries.

pip install --no-cache-dir -r /tmp/requirements.txt

In this stage, we are installing all the project dependencies mentioned in requirements.txt. The --no-cache-dir flag is used to disable caching during pip installation. By default, pip caches installation files(.whl etc) and source files(.tar.gz etc). In docker installation, we don’t reinstall using the cache hence disabling it will reduce image size.

useradd -U app_user

Here, we are creating a non-root user app_user using the useradd command. By default, Docker runs container processes as root inside of a container. This is a bad practice since attackers can gain root access to the Docker host if they manage to break out of the container (source). The -U flag creates a user group with the same name.

install -d -m 0755 -o app_user -g app_user /app/static

At the end of the section, we are creating a folder app/static and giving our user app_user ownership to it. This folder will be used by Django to collect all static resources of our project by running the command python manage.py collectstatic .

Section 5- Code and User Setup

WORKDIR /app

We start this section by setting the working directory. The WORKDIR instruction sets the working directory for subsequent commands. Since we don’t want to copy our code to the root folder, we are copying it to /app folder.

USER app_user:app_user

Then we are setting the non-root user created at the end of Section 4 as the owner of subsequent commands. As mentioned earlier, this will improve our security.

COPY --chown=app_user:app_user . .

With everything set up, we copy the project into the docker image. Any code change will only result in an update in this and subsequent layers of docker, hence resulting in reduced docker image build time. While copying we are providing the content’s ownership to our user app_user created in Section 4.

RUN chmod +x docker/*.sh

At the end of this section, we are giving executable permission to our two scripts files i.e. entrypoint.sh and start.sh . We will go into detail about these two files after the end of Section 6.

Section 6- Docker Run Checks and Configurations

ENTRYPOINT [ "docker/entrypoint.sh" ]

The ENTRYPOINT section of a Dockerfile is always executed, hence we would like to hitch it for validations and Django commands such as migrate. The CMD is overridden by the command section in a docker-compose file so the value given here, serves as a default.

CMD [ "docker/start.sh", "server" ]

For a better understanding of what we are trying to do with ENTRYPOINT and CMD let’s look at the corresponding files entrypoint.sh and start.sh which are invoked by them.

entrypoint.sh

#!/bin/bash

# entrypoint.sh file of Dockerfile

# Section 1- Bash options
set -o errexit  
set -o pipefail  
set -o nounset

# Section 2: Health of dependent services  
postgres_ready() {  
    python << END  
import sys

from psycopg2 import connect  
from psycopg2.errors import OperationalError

try:  
    connect(  
        dbname="${DJANGO_POSTGRES_DATABASE}",  
        user="${DJANGO_POSTGRES_USER}",  
        password="${DJANGO_POSTGRES_PASSWORD}",  
        host="${DJANGO_POSTGRES_HOST}",  
        port="${DJANGO_POSTGRES_PORT}",  
    )  
except OperationalError:  
    sys.exit(-1)  
END  
}

redis_ready() {  
    python << END  
import sys

from redis import Redis  
from redis import RedisError

try:  
    redis = Redis.from_url("${CELERY_BROKER_URL}", db=0)  
    redis.ping()  
except RedisError:  
    sys.exit(-1)  
END  
}

until postgres_ready; do  
  >&2 echo "Waiting for PostgreSQL to become available..."  
  sleep 5  
done  
>&2 echo "PostgreSQL is available"

until redis_ready; do  
  >&2 echo "Waiting for Redis to become available..."  
  sleep 5  
done  
>&2 echo "Redis is available"

# Section 3- Idempotent Django commands  
python manage.py collectstatic --noinput  
python manage.py makemigrations  
python manage.py migrate

exec "$@"

Let’s look at the above entrypoint.sh, though in lesser detail than Dockerfile.

Docker provides a default entrypoint /bin/sh . In most systems, it is a symbolic link, and in the case of Ubuntu it is linked to /bin/bash, but in some scenarios, this assumption could be wrong(source). Hence we will be explicitly linking it to /bin/bash.

Section 1- Bash options

set -o errexit  
set -o pipefail  
set -o nounset

Here, we are setting few bash options. The errexit fails the script on the first encounter of error and doesn’t proceed further, which is default bash behavior. The pipefail means that if any element of the pipeline fails, then the pipeline as a whole will fail. The nounset forces error whenever an unset variable is extended.

Section 2: Health of dependent services

Earlier, we had assumed that our application is using PostgreSQL database and Redis as celery backend. In this section, we are checking if both services are up and if not, we wait for them to come up.

Similarly, you may add other such critical services which are necessary for the normal functioning of your application.

Section 3- Idempotent Django commands

python manage.py collectstatic --noinput  
python manage.py makemigrations  
python manage.py migrate

There are many Django management commands which we need to run before starting our Django server. This includes commands to collect all static resources, collectstatic, command to generate migrations files, makemigrations, and command to apply these migrations on the database, migrate. In this section, we are running all such commands.

The only thing which should be kept in mind is that all these commands should be idempotent i.e. multiple runs of these commands should not have any side-effect on the state of our application. Idempotency is required here because, suppose if Kubernetes is scaling these containers, multiple instances will be running and they will interfere will each other.

In fact, any idempotent operation can be executed here, not just Django commands.

start.sh

We are using start.sh file, to leverage the same Dockerfile and commands to run containers for Django server, Celery workers, Celery Beat and Flower, by having different arguments for each.

#!/bin/bash  

cd /app  

if [ $# -eq 0 ]; then  
    echo "Usage: start.sh [PROCESS_TYPE](server/beat/worker/flower)"  
    exit 1  
fi  

PROCESS_TYPE=$1  

if [ "$PROCESS_TYPE" = "server" ]; then  
    if [ "$DJANGO_DEBUG" = "true" ]; then  
        gunicorn \  
            --reload \  
            --bind 0.0.0.0:8000 \  
            --workers 2 \  
            --worker-class eventlet \  
            --log-level DEBUG \  
            --access-logfile "-" \  
            --error-logfile "-" \  
            dockerapp.wsgi  
    else  
        gunicorn \  
            --bind 0.0.0.0:8000 \  
            --workers 2 \  
            --worker-class eventlet \  
            --log-level DEBUG \  
            --access-logfile "-" \  
            --error-logfile "-" \  
            dockerapp.wsgi  
    fi  
elif [ "$PROCESS_TYPE" = "beat" ]; then  
    celery \  
        --app dockerapp.celery_app \  
        beat \  
        --loglevel INFO \  
        --scheduler django_celery_beat.schedulers:DatabaseScheduler  
elif [ "$PROCESS_TYPE" = "flower" ]; then  
    celery \  
        --app dockerapp.celery_app \  
        flower \  
        --basic_auth="${CELERY_FLOWER_USER}:${CELERY_FLOWER_PASSWORD}" \  
        --loglevel INFO  
elif [ "$PROCESS_TYPE" = "worker" ]; then  
    celery \  
        --app dockerapp.celery_app \  
        worker \  
        --loglevel INFO  
fi

In the above script, we are using gunicorn to run our application server which is recommended approach for production. The python manage.py runserver command should be used only in the development setup.

Following will be command in the docker-compose file for each container type:

  • Django server: start.sh server
  • Celery beat: start.sh beat
  • Flower: start.sh flower
  • Celery worker: start.sh worker

A Django repo with the above Docker setup, along with the docker-compose file is available for download in Github, here.

I will be writing a follow-up article on this soon, where I will be discussing in detail the docker-compose and Kubernetes files corresponding to the above Docker setup.

This article heading doesn’t claim but describes the intent to have a production-ready Docker setup. Hence, please comment on any gap or improvement in the above setup.

That’s all for this blog, please follow for upcoming articles, thank you!

Related articles on the topic Django