In this post, we will explore our open-source project to understand how to establish a database connection in SQLAlchemy, and how to leverage strong types to design safe and maintainable methods.

In SQLAlchemy, database connection is performed by specifying a connection string, which holds the necessary information. The string is used to initialize an engine, which in turn is used to build a session, which will perform database operations. These can be encoded in maintainable and safe methods using the strong typing idiom.

In our previous post, we discussed how to encode data models in SQLAlchemy and Python. Today, we will see how create a database connection in SQLAlchemy, and how to use strong types to design more maintainable and safer methods.

In sem, database operations are performed by the CRUDHandler class, a wrapper around SQLAlchemy functions defined in the modules/crud_handler.py module. CRUD stands for “create, record, update, delete“, and is the typical set of operations to be performed on a database.

crud_handler.py also contains the definition of the CRUDHandlerError exception, used by CRUDHandler to signal errors encountered in database and data access.

# modules/crud_handler.py

 1	"""Class mediating CRUD operations on the DB.
 2	
 3	Classes
 4	-----------------------
 5	CRUDHandlerError
 6	    Exception raised for errors in DB operations.
 7	CRUDHandler
 8	    Class mediating CRUD operations.
 9	
10	Functions
11	-----------------------
12      CRUDHandlerContext()
13          Manage context for CRUDHandler.
14	"""

The CRUDHandlerError Exception

CRUDHandlerError is a custom exception, which will be raised (together with an explicative message) when the CRUDHandler encounters an error. Creating such custom exceptions is good to distinguish them from exceptions which other parts of the program may raise.

# modules/crud_handler.py

40	from contextlib import contextmanager
41
42	from pydantic import ValidationError
43	
44	from sqlalchemy import select
45	from sqlalchemy import func
46	from sqlalchemy import and_
47	from sqlalchemy.sql import text
48	
49	from modules.models import Expense
50	from modules.session import init_session
51	from modules.schemas import ExpenseAdd
52	from modules.schemas import ExpenseUpdate
53	from modules.schemas import QueryParameters

<---------------------------snip--------------------------->

56	class CRUDHandlerError(Exception):
57	    """Exception raised for errors in DB operations."""

The standard procedure is to define a derived type of the builtin Exception class. Even without adding method and attributes, the exception will operate like the default ones (e.g., accepting an error message as a string on construction, as we will see below).

Database Connection

Initialization details

We can now examine the CRUDHandler class itself, beginning from its initialization, which needs to set up three elements to have an operational database connection.

At first, a connection string needs to be set up. This will be a string containing the necessary information for SQLAlchemy to connect to or create the database.

Its first component is usually a driver section, carrying information on the underlying database service and on components used to connect to it. The string will then carry database details, in a pre-defined format. These will be user credentials, database service location (i.e., HTTP host and port to connect to for data exchanges) and database name.

The second step is creating an SQLAlchemy engine, a low-level object connected to the database and used to perform low-level, non-CRUD operations. These include, for instance, creating the database and the associated tables.

Finally, the initialization method will create an SQLAlchemy Session, a high-level construct used to execute CRUD operations. This object is based on an engine, and an instance of this type will be stored as member in CRUDHandler.

Initialization function

I placed these operations in the init_session() function of the modules/session.py module. This will allow more customizability of the connection parameters (as we will see when we will operate with containers).1

# modules/session.py

32	from sqlalchemy import create_engine
33	from sqlalchemy.orm import Session
34	from sqlalchemy_utils import database_exists
35	from sqlalchemy_utils import create_database
36	
37	from modules.models import Base
38	
39	
40	def init_session(database: str) -> Session:
41	    """Init connection to specified DB and write schema.
42	
43	    Parameters
44	    -----------------------
45	    database : str
46	        Database name to connect to.
47	
48	    Returns
49	    -----------------------
50	    sqlalchemy.orm.Session
51	        The initialized Session.
52	    """
53	    DRIVER = "postgresql+psycopg"
54	    USER = "postgres"
55	    PASSWORD = ""
56	    HOST = "localhost"
57	    PORT = "5432"
58	
59	    DB = f"{DRIVER}://{USER}:{PASSWORD}@{HOST}:{PORT}/{database}"
60	    if not database_exists(DB):
61	        create_database(DB)
62	
63	    engine = create_engine(DB)
64	    # Building schema
65	    Base.metadata.create_all(engine)
66	
67	    return Session(bind=engine)

psycopg is one of the possible interfaces to connect to a PostgreSQL database, and uses under the hood the psycopg3 driver. The rest of the connection string sets up a database for the user postgres (the default username in most PostgreSQL installations) without password, and instructs SQLAlchemy to send data to the HTTP address localhost on port 5432 (the default location where the PostgreSQL service can be contacted).

The database_exists() and create_database() SQLAlchemy functions allow to test the existence of a database and to create it following the connection string. The initialization function then creates an engine object based on the connection string.

This object is immediately used to create the tables required to host data following our specified model. This is done by calling the create_all() method of the SQLAlchemy MetaData object contained in Base as the metadata attribute. All models inherit access to this attribute (through Base), allowing it to create tables to host all the created models.

Finally, the engine is passed as argument to create a Session object, which will be used to actually perform CRUD operations.

The CRUDHandler will use init_session() to initialize its internal session attribute. It will also need a function to close the session (which can be done calling the Session.close() method), to be invoked when database operations are completed.

# modules/crud_handler.py

 60	class CRUDHandler:
 61	    """Class mediating CRUD operations.
 62	
 63	    Attributes
 64	    -----------------------
 65	    session : sqlalchemy.Orm.Session
 66	        Session connected to the specified DB.
 67	
 68	    Public methods
 69	    -----------------------
 70	    __init__()
 71	        Initialize class instance.
 72	    close()
 73	        Close DB connection.
 74	    add()
 75	        Add expense to the DB.
 76	    query()
 77	        Return expenses matching specified filters.
 78	    summarize()
 79	        Summarize expenses matching specified filters.
 80	    update()
 81	        Update expense selected by ID.
 82	    remove()
 83	        Remove selected expenses from the DB.
 84	    erase()
 85	        Remove all expenses from the DB.
 86	    load()
 87	        Append the contents of a CSV file to the database.
 88	    save()
 89	        Save the current contents of the DB to a CSV file.
 90	    """
 91
 92         def __init__(self, database: str):
 93             """Initialize class instance.
 94
 95             Parameters
 96	        -----------------------
 97	        database : str
 98	            Database name to connect to.
 99	        """
100	        self.session = init_session(database)
101
102	    def close(self):
103	        """Close DB connection."""
104	        self.session.close()

Context management

In classes such as CRUDHandler, which manage resources, it is useful to define a context manager, which will create an access to the resource and perform cleanup when execution finishes, either normally or if an exception is raised. I implemented this functionality in the CRUDHandlerContext() function (which is not a method of CRUDHandler), defined at the end of modules/crud_handler.py.

# modules/crud_handler.py

334	@contextmanager
335	def CRUDHandlerContext(database: str) -> CRUDHandler:
336	    """Manage context for CRUDHandler.
337	
338	    Parameters
339	    -----------------------
340	    database : str
341	        Database name.
342	
343	    Yields
344	    -----------------------
345	    CRUDHandler
346	        The context-managed CRUDHandler.
347	    """
348	    ch = CRUDHandler(database)
349	    try:
350	        yield ch
351	    finally:
352	        ch.close()

The contextlib.contextmanager decorator allows to create a context manager from a generator-like function, which yields the created object, without having to write a generator class with nontrivial __enter__() and __exit__() methods. The yield instruction should be included in a try block, with the finally block being responsible for the cleanup.

In practice, this context manager will be used as

with CRUDHandlerContext(<database name>) as <handle>:
    <operations using the CRUDHandler object <handle>>

with cleanup ensured when exiting the context. We will see concrete application examples in the next installments.

Strong Types

We are almost ready to begin delving into the CRUD operations. Before proceeding, however, we need to understand the concept of strong type, which is a powerful programming paradigm for the design of functions, and is heavily used in sem.

The key idea is to create types to aggregate the parameters accepted by a function in a single object (similarly to what is done in C, where structs are used to return multiple values). This idiom allows increased security, by decreasing the possibility of parameter mismatch and making parameter validation easier.

In sem, I defined the types used by the CRUD functions in modules/schemas.py.

# modules/schemas.py

 1	"""Types for CRUD operations.
 2	
 3	Classes
 4	-----------------------
 5	ExpenseBase
 6	    Base expense class.
 7	ExpenseAdd
 8	    Derived expense class for insertion operations.
 9	QueryParameters
10	    Strong type for query parameters.
11	ExpenseRead
12	    Derived expense class for query operations.
13	ExpenseUpdate
14	    Container for data to update existing expenses with.
15	"""

We will at first examine ExpenseAdd, the type used in “create” operations in CRUDHandler. I designed it so that the “create” method in CRUDHandler could have the signature

add(self, data: ExpenseAdd)

with no need for additional arguments.

ExpenseAdd must then hold all the information necessary to create an Expense object, which means all the fields except the ID (which should be automatically assigned by the database). Since these fields will also be used by other strong types, I encapsulated them in a separate ExpenseBase base class.

# modules/schemas.py

39	import datetime
40	from typing import Optional
41	
42	from pydantic import BaseModel
43	from pydantic import Field
44	
45	
46	class ExpenseBase(BaseModel):
47	    """Base expense class.
48	
49	    Attributes
50	    -----------------------
51	    date : datetime.date
52	        Date of the expense.
53	    type : str
54	        Low-level group of the expense.
55	    category : Optional[str]
56	        High-level group of the expense. Default is `None` -> "".
57	    amount : float
58	        Amount of the expense.
59	    description : str
60	        Description of the expense.
61	    """
62	
63	    date: datetime.date = Field(description="Date of the expense.")
64	    type: str = Field(description="Low-level group of the expense.")
65	    category: str = Field(
66	        default="",
67	        description="High-level group of the expense. Default is 
                            ''.",
68	    )
69	    amount: float = Field(description="Amount of the expense.")
70	    description: str = Field(description="Description of the
                                                 expense.")
71	
72	
73	class ExpenseAdd(ExpenseBase):
74	    """Type for insertion operations."""

ExpenseBase inherits from BaseModel, a type defined in the pydantic module. This is essentially a dataclass with builtin validation for field values, supporting all Python builtin types: this gives strong security at initialization at zero cost for the developer.

BaseModel also come with a pre-defined keyword argument constructor

instance = ExpenseAdd(date=..., type=..., ...)

and serialization functions, and is strongly integrated with FastAPI, the API framework of choice for sem.

In pydantic BaseModels, the Field construct is used to provide advanced type annotations, which provide information beyond the simple data type. Here, we are specifying descriptions for the fields, which will appear in the documentation, as well as default values, where applicable.

Why Two Sets of Types?

As we have seen, ExpenseAdd shares many similarities with the original Expense type that we saw in our previous post, and which defines the data model. If you wonder why I re-defined an almost identical copy of an existing type, there are a few reasons.

Firstly, in SQLAlchemy instances of the model type remain connected to their associated data in the database, and changes applied to the former can be exported to the latter. While we have not seen this pattern yet, it will play a crucial role in the “update” CRUD method.

Due to this feature, working with model instances outside of CRUD operations may lead to unexpected behavior and changes in the database, making a good case for independent types which insulate the contained data.

Furthermore, data models (inheriting from SQLAlchemy.orm.DeclarativeBase) do not share the same properties as our strong types (which inherit from pydantic.BaseModel). This makes them miss out on, e.g., validation at construction, serialization functions

Finally, both our model and our strong types use type annotations to provide additional information on the fields. However, the annotations of the former are based on what SQLAlchemy requires to create the database, and do not mirror the (arguably more helpful for documentation and the developer) annotations of the strong types.

As such, even though a little duplication is required, having an independent set of validated strong types is by and large a better idea.

What’s Next?

We have learned how to establish an SQLAlchemy database connection and how to use strong types for the design of safe and maintainable methods.

In the next installment in this series, we will explore the CRUDHandler class, to understand how to add, query, update, and remove data.


Author: Adriano Angelone

After obtaining his master in Physics at University of Pisa in 2013, he received his Ph. D. in Physics at Strasbourg University in 2017. He worked as a post-doctoral researcher at Strasbourg University, SISSA (Trieste) and Sorbonne University (Paris), before joining eXact-lab as Scientific Software Developer in 2023.

In eXact-lab, he works on the optimization of computational codes, and on the development of data engineering software.

  1. The file session.py has since been modified for the purpose of containerization. The version displayed here can be accessed by checking out to the legacy branch of the sem repository. ↩︎