Source: https://www.pexels.com/photo/steel-container-on-container-dock-122164/
In our previous post, we saw how to execute, build documentation, and prepare testing for a server application. Today, we will understand how to deploy our server application via docker containers, and we will see some ideas on how to write a simple command-line interface (CLI).
Introduction to Containers
Distributing and running applications across multiple platforms and operating systems can in some cases be rather cumbersome. Indeed, packages built for one Linux distribution may require re-packaging to be used on another and/or Mac OS (let alone on Windows, which has a rather different set of build tools).
This can in principle be solved by shipping the source code and letting the user build/run the program on its own system; however, this places the deployment onus on the user. Containerization offers an alternative, allowing straightforward building and set-up of software across a wide variety of platforms.
The basic idea behind containerization is to create a layer of emulation for an environment of choice, where the program will be run. This is similar in principle to the use of a virtual machine, which allows to emulate an operating system while running in another, with a few key differences.
Firstly, containerization offers in general much better performance because the emulation layer may be thinner, i.e., emulate less software. Almost-native levels of performance are therefore possible, making containerization an attractive option.
Furthermore, since every container is usually meant to run only one process, container setup process is usually simpler, avoiding cross-compatibility issues. For instance, as we will see, Python virtual environments are mostly not required, and one can simply install Python packages at the system level.
Finally, containerized services can be networked: multiple containers, each executing a separate process, can be set up to communicate and interact via HTTP requests. This allows to straightforwardly deploy microservice architectures, in which a program is broken up in smaller, independent units communicating over a network (with more complexity, but possible flexibility increases). These networks (mostly) isolate the containers from the host system, but still allow them to communicate with each other.
Server Container
To simplify the deployment of sem
, I set up a container-based build process, complementing its poetry
-based local deployment. I implemented it using the well-known docker
system, as well as its docker-compose
utility to manage the multiple containers required (see below). These programs allow to build containers using configuration files, and to run them in a container engine.
As is common for database applications, I prepared a container for the server, which will communicate with a separate container specifically dedicated to the database service. This section will explore the way in which the former is set up.
Containers are the combination of a process to run and an image, an environment where the software and setup required by the process are installed and implemented. Images are usually built adding layers (configuration/installation steps) on top of already existing images. The latter usually offer the functionalities of a particular program, programming language interpreter, or operating system.
Docker allows to create custom images and containers from the contents of Dockerfiles, configuration files which contain the build instructions, layer by layer. In sem
, the database container only requires minimal setup, and does not warrant a dedicated Dockerfile; the server container instead is built from the instructions placed in docker/Dockerfile
, which we will explore step by step in the following.
# docker/Dockerfile
1 # syntax=docker/dockerfile:1
2
3 # selecting python image
4 FROM python:3.11-slim
5
6 # creating a workdir
7 WORKDIR /app
8
9 # installing system packages, required for psycopg
10 RUN apt-get update
11 RUN apt-get -y install gcc postgresql postgresql-contrib libpq-dev
The first step is choosing a base image upon which the server image will be built, using the FROM
directive (by default, images are downloaded from the dockerhub online repository). Since the server program is written in Python, the natural choice is picking a Python image, which comes with a pre-installed Python interpreter and the pip
package manager. In many cases, multiple versions of each image are available, allowing to choose the desired one.
The system emulated by the Python image installed here is based on Debian Linux (with minimal additional software, since I chose the slim
image), with the corresponding filesystem. The WORKDIR
directive creates a directory which will work as base for the rest of the instructions to be executed (building, copying external files, etc).
Due to our choice of the psycopg
driver to access the database, we need to install some system (not Python) packages, which the driver will employ. This is done executing in the internal image shell system-specific commands, using the RUN
command. Since the container is based on a Debian system, apt-get
will handle package installation (note the -y
option, which accepts at confirmation prompts, and is necessary due to the non-interactive nature of the installation).
Each of the steps above is one of the image setup layers: docker automatically saves checkpoints of each layer during construction, allowing subsequent builds to skip rebuilding layers until the first point of change. The order in which layers are executed therefore allows to simplify the build process during development (if frequently changed layers are placed at the end of the build whenever possible).
# docker/Dockerfile
13 # preparing separately requirement file,
14 # installation will be performed iif requirements change
15 COPY docker/requirements.txt .
16
17 # installing required packages
18 RUN pip install -r requirements.txt
The COPY
directive allows to copy files from the host system to the container filesystem (here, to the working directory). The copied file (docker/requirements.txt
) contains the dependency information of the project, in a format which the pip
installer can understand, and can be generated by poetry
with the command
$ poetry run pip freeze | grep -v '^-e' > docker/requirements.txt
(here grep
removes a line dedicated to installing the project package itself). Calling pip install -r
with the requirements file as argument will install the project dependencies at a system-wide level on the container.
Note how pip
, rather than poetry
, is used to manage project dependencies here. poetry
is a tool built on top of pip
, which manages dependencies and builds virtual environments to run software without dependency issues with other programs. When containerizing, the latter aspect usually becomes unnecessary, since mostly only one program will be run in the container. Therefore, usually containers use pip
to install Python packages at the system (container) level, where programs are also run.
# docker/Dockerfile
20 # copy all other local content host -> container
21 COPY . .
22
23 # launch command
24 CMD "docker/run.sh"
At line 21, I copy the whole project directory to the working directory: this will allow the container to use the project files. Note how I copied docker/requirements.txt
separately: if I had copied the whole project at line 15, changes in any project file would result in docker performing the installation process at line 18 again (and all the following layers). This highlights how layer planning optimizes image building during development.
At the final line, CMD
is used to specify the command which will be executed when running the container based on the image. In the case of sem
, this is the executable docker/run.sh
bash script:
# docker/run.sh
1 #!/bin/bash
2
3 if [[ $SEM_LAUNCH == "docs" ]]
4 then
5 mkdocs build
6 mkdocs serve
7 else
8 python -m modules.main
9 fi
Based on the value of the SEM_LAUNCH
variable, the container will serve the documentation, or run a server instance which will remain active and listen for HTTP requests.
Container Composition
As we mentioned before, the server container will interact with a dedicated database container. The containers themselves, and the network they will interact over, are set up in sem
using the docker-compose
utility. Its configuration is contained in the docker-compose.yml
file in the project root directory, which we will explore step by step.
# docker-compose.yml
1 version: "3"
2
3 services:
4 db:
5 container_name: sem-db
6 image: postgres:15.4-alpine
7 restart: always
The services
section contains a list of the services to be set up. We start with the db
service, whose container will be named sem-db
and will be based on a PostgreSQL image. The container will always be restarted when docker-compose
is invoked.
When run as a container, the db
service will launch an instance of the database service, which will remain listening for connections. This instance will not overlap with system ones, even though it will listen on the same port, since it will only be accessible from within the container network in our setup.
# docker-compose.yml
8 volumes:
9 - ./sem-db-data:/var/lib/postgresql/data
10 environment:
11 POSTGRES_USER: sem
12 POSTGRES_PASSWORD: sem
13 # required for the check below - otherwise postgresql
14 # will use undefined 'root' user and raise errors
15 PGUSER: sem
At line 8, I create a volume, i.e., a shared host-container directory used to persistently store data between container executions. Here, I am linking the sem-db-data/
folder in the project root directory with the /var/lib/postgresql/data/
directory (the location where PostgreSQL stores database data) in the container filesystem. In this way, sem-db-data/
will persistently store database data, allowing to stop and resume execution of the container without data loss.
At line 10, I introduce some environment variables, which will be declared in the database container environment. These are default PostgreSQL variables containing the username and password used to access the database, as well as the user information for the healthcheck (see below).
# docker-compose.yml
16 healthcheck:
17 # postgresql starts up, stops, and then restarts
18 # => errors if the server connects before the stop
19 # this checks if the system is ready
20 test: [ "CMD-SHELL", "pg_isready" ]
21 interval: 5s
22 timeout: 5s
23 retries: 5
The healthcheck
directive allows to perform tests once the startup of a container is complete, to verify that the launch was successful: here, I used it to solve an issue with the PostgreSQL container.
Specifically, the PostgreSQL service starts and then stops immediately before restarting. If the server service (described below) attempts to connect to the database while the latter is inactive, the launch will fail. The healthcheck process set up here makes the database container wait a few seconds and then runs the pg_isready
command, which checks the correct execution of the database service (after enough time has passed to allow PostgreSQL to restart).
When this is done, and on success, the container will broadcast that its state is healthy. Later, I will instruct the server service to wait for the healthcheck to complete before launching.
# docker-compose.yml
25 server:
26 container_name: sem-server
27 depends_on:
28 db:
29 condition: service_healthy
30 image: sem-server
This section of the docker-compose file describes the server service, which will be run in a container named sem-server
based on an image of the same name. I specified that this service should wait until the db
service broadcasts to be in a healthy state before starting, to avoid connecting to the database service before it is ready.
# docker-compose.yml
31 build:
32 context: .
33 dockerfile: docker/Dockerfile
34 ports:
35 - 8000:8000
36 - 8001:8001
37 environment:
38 - SEM_DOCKER=1
39 - SEM_LAUNCH
The build
section contains information on how to build the image associated to the service. Here, I specified to use the Dockerfile docker/Dockerfile
, with the current directory as base.
The container will expose the ports 8000 and 8001, allowing the content offered on these ports by the service (namely, the server and documentation pages) to be reached by the host system as well, at localhost:8000
and localhost:8001
.
Finally, the environment
section allows to specify environment variables for the container environment. Values can be assigned in one of two ways:
- The variables can be created anew and given values, such as for
SEM_DOCKER
above, which I set to 1. The goal of this variable is to inform the server that docker-specific configurations should be sourced (see below). - Alternatively, environment variables can be forwarded from the host environment, like
SEM_LAUNCH
here. This variable, as seen indocker/run.sh
, specifies whether or not the server program or the documentation server should be launched, and is selected when launching the container ensemble at the host level (see below).
Networking and Execution
Docker-compose automatically sets up a network containing the containers described in its configuration file. It also offers a Domain Name System (DNS), which allows to access containers using their names as URLs, rather than their container network IP addresses. This may require some changes in networking setups (by, e.g., replacing localhost
with the container name).
Performing these changes was the final part of my containerization effort, and is done in the connection module (modules/session.py
). I modified the earlier version of the file as follows:
# modules/session.py
31 import os
...
41 def init_session(database: str) -> Session:
...
54 DRIVER = "postgresql+psycopg"
55
56 if os.environ.get("SEM_DOCKER") == "1":
57 USER = "sem"
58 PASSWORD = "sem"
59 HOST = "sem-db"
60 else:
61 USER = "postgres"
62 PASSWORD = ""
63 HOST = "localhost"
64
65 PORT = "5432"
66
67 DB = f"{DRIVER}://{USER}:{PASSWORD}@{HOST}:{PORT}/{database}"
...
The value of the SEM_DOCKER
environment variable is extracted using the os.environ
dictionary, which stores environment variable values as strings. The docker-specific or the generic database specifications are selected based on the value of SEM_DOCKER,
which can be set at launch. Note how the container name (sem-db
) is used as base URL when in the containerized environment.
The container setup discussed here can now be launched. This is done using the docker-compose
command, specialized below for normal server execution and documentation serving, respectively:
$ docker compose up --build
$ SEM_LAUNCH="docs" docker compose up --build
This command will rebuild the containers at every call (thanks to the --build
option) and launch the container ensemble with the passed command-line options.
The process can be simplified by writing a makefile:
# makefile
1 .PHONY: docker docs
...
6 docker-run:
7 docker compose up --build
8
9 docker-docs:
10 SEM_LAUNCH="docs" docker compose up --build
...
15 run:
16 poetry run sem
17
18 test:
19 poetry run python3 -m pytest --ignore=sem-db-data/ -x -s -v .
20
21 requirements:
22 poetry run pip freeze
| grep -v '^-e' > docker/requirements.txt
23
24 docs:
25 poetry run mkdocs build
26 poetry run mkdocs serve
The makefile automates most of the steps required to launch the server, both locally and using a container ensemble. For instance, the command
$ make docker-run
(possibly requiring administrative rights) will launch the container ensemble set up as above.
Command-Line Interface
Good client design involves careful consideration to pick the best options and frameworks based on the needs of the user.
A very popular option is the use of web-based graphical interfaces, web pages which translate user input (collected using graphical controls) in HTTP requests to a server. Here we will explore a much simpler but still relatively versatile option, in the form of command-line interfaces.
The standard framework to build this type of interface in Python is argparse, a library which allows to parse command-line arguments, options, and sub-commands. Commands can then be composed on a UNIX shell, with default values, some degree of validation, and more features. This is especially useful when shell features (e.g., shell wildcard expansion, interaction with other shell tools) may have fruitful interactions with the program (e.g., if the program accepts multiple filename arguments).
In sem
, I implemented a rather simpler (but for some aspects more versatile) command-line interface, which works on an interactive Python shell. Specifically, the module modules/cli.py
defines a few Python functions which receive user input and use it to perform HTTP requests to the server. The module begins with some general setup:
# modules/cli.py
48 import os
49
50 from fastapi.encoders import jsonable_encoder
51
52 import requests
53
54 # `rich` works for tables and general printing
55 from rich.console import Console
...
58 # `colorama` works for input() in docker
59 from colorama import Fore
60 from colorama import Style
61
62 from modules.schemas import ExpenseAdd
...
66 console = Console()
...
69 # Emphasis formatting - colorama
70 EM = Fore.GREEN + Style.BRIGHT
71 NEM = Style.RESET_ALL
72
73 if os.environ.get("SEM_DOCKER") == "1":
74 server = "http://sem-server:8000"
75 else:
76 server = "http://127.0.0.1:8000"
colorama
and rich
are two libraries which allow complex output formatting (colored text, tables…). rich
is a more complete framework, which usually would be sufficient on its own. In this instance, however, I used both since rich
displayed incorrect behavior when launching the command-line interface in the containerized setup.
After defining an object of type console.Console
, which will perform the printing, and setting emphasis and reset characters, I defined the address of the server which should receive requests in the server
variable (note the distinction between the containerized and non-containerized case).
We will examine here the add()
function, which allows the user to specify the data of an expense to add to the database.
# modules/cli.py
87 def add():
88 """Add an Expense, querying the user for data."""
89 date = input(f"{EM}Date{NEM} (YYYY-MM-DD) :: ")
90 typ = input(f"{EM}Type{NEM} :: ")
91 category = input(f"{EM}Category{NEM} (optional) :: ")
92 amount = input(f"{EM}Amount{NEM} :: ")
93 description = input(f"{EM}Description{NEM} :: ")
94
95 response = requests.post(
96 server + "/add",
97 json=jsonable_encoder(
98 ExpenseAdd(
99 date=date,
100 type=typ,
101 category=category,
102 amount=amount,
103 description=description,
104 )
105 ),
106 )
107
108 console.print(response.status_code)
109 console.print(response.json())
After receiving the data from the user via the input()
Python function, the function sends a request to the server at the /add
URL. This is done via the post()
function of the requests
module, which offers methods to send HTTP requests to URLs of choice (effectively implementing client-like functionality).
The json
argument of the function allows to pass a request body (here obtained by serializing an ExpenseAdd
object, built with user-specified data). The status code and body of the response to the request, returned by the post()
function, are then printed to the screen.
The rest of the module contains similar functions, which allow the user to send requests to all the endpoints of the API. The module can be launched interactively with
$ poetry run python3 -im modules.cli
when running locally (note the -i
option, which loads the module in an interactive Python shell). When running in a containerized setup, I use
$ docker compose run server python -im modules.cli
which executes a duplicated instance of the server container, where the final command (normally, launching the server) is overridden to load the CLI module in an interactive shell. This simplifies setup, since a CLI container is not needed.
Both the local- and container-based CLI launch commands are included in the project makefile (as the cli
and docker-cli
targets, respectively). An example of usage of the CLI may look like
where I call the add()
function, to add an expense, calling then the query()
function, to list the expenses contained in the database in a formatted table (via the utilities in rich
). In the call to query()
, no values were passed to the filters, making the system print all the stored expenses.
Author: Adriano Angelone
After obtaining his master in Physics at University of Pisa in 2013, he received his Ph. D. in Physics at Strasbourg University in 2017. He worked as a post-doctoral researcher at Strasbourg University, SISSA (Trieste) and Sorbonne University (Paris), before joining eXact-lab as Scientific Software Developer in 2023.
In eXact-lab, he works on the optimization of computational codes, and on the development of data engineering software.