In this post, we will continue our exploration of our open-source project, to understand how to perform CRUD (Create, Read, Update, Delete) operations on database records.

CRUD (Create, Read, Update, Delete) operations are the fundamental actions which can be performed on a database.

In our previous post, we started exploring the structure of our CRUDHandler class, showing how to connect to a database and how strong types help us design safe and reliable methods. Today, we will understand how CRUD (Create, Read, Update, Delete) operations are implemented in practice on database records.

Creating Records

The first of the CRUD operations is the creation of new records, handled by the CRUDHandler.add() method. This function which will receive as argument an instance of ExpenseAdd (check out our previous post for its definition) containing the new record data.

# modules/crud_handler.py

106	    def add(self, data: ExpenseAdd):
107	        """Add expense to the DB.
108	
109	        Parameters
110	        -----------------------
111	        data : ExpenseAdd
112	            Expense data.
113	        """
114	        # Primary key added automatically
115	        self.session.add(Expense(**data.model_dump()))
116	        self.session.commit()

In line 115, I transform the passed ExpenseAdd in a dictionary having the fields as keys, using the BaseModel.model_dump() method. The dictionary is then unpacked, and its key-value pairs are passed as constructor arguments to an Expense instance. Here I do not need to specify the missing id field, since it has been set as a primary key (and therefore will be auto-inserted by the database).

The created Expense is then marked for addition to the database using the Session.add() method (which accepts a data record as argument). The Session.commit() method is finally called to permanently commit the change to the database.

Read (Query) Records

Designing a query function requires in the first place a decision on what criteria will be used to query the database: based on my experience with programs similar to sem in the past, I decided for the following.

  • The user should be able to select a date window which the returned expenses should be within. One of or both the dates should be optional: specifying only a start or end date will return all expenses after or before it, respectively, and no window at all will make all expenses match.
  • The user should be able to filter by type and category. These arguments should be passed as lists, in case the user wants more than one allowed value. They should also be optional: if no values are passed, no filtering should be done based on these criteria.

Strong Types for Query Functions

Once the query criteria have been decided, the first step is to encode them in a strong type QueryParameters, which will be passed as argument to the query functions.

# modules/schemas.py

 77	class QueryParameters(BaseModel):
 78	    """Strong type for query parameters.
 79	
 80	    Attributes
 81	    -----------------------
 82	    start : Optional[datetime.date]
 83	        Start date (included).
                Not filtered on if `None`. Default is `None`.
 84	    end : Optional[datetime.date]
 85	        End date (included).
                Not filtered on if `None`. Default is `None`.
 86	    types : Optional[list[str]]
 87	        Types to filter the query.
                If `None`, all types. Default is `None`.
 88	    categories : Optional[list[str]]
 89	        Categories to filter the query. If `None`, all types. Default
                is
 90             `None`.
 91	    """
 92	
 93	    start: Optional[datetime.date] = Field(
 94	        default=None,
 95	        description="""Start date (included).
                            Not filtered on if `None`.
 96	                    Default is `None`.""",
 97	    )
 98	    end: Optional[datetime.date] = Field(
 99	        default=None,
100	        description="""End date (included).
                            Not filtered on if `None`. Default
101	                    is `None`.""",
102	    )
103	    types: Optional[list[str]] = Field(
104	        default=None,
105	        description="""Types to filter the query.
                            If `None`, all types. Default
106	                    is `None`.""",
107	    )
108	    categories: Optional[list[str]] = Field(
109	        default=None,
110	        description="""Categories to filter the query.
                            If `None`, all types.
111	                    Default is `None`.""",
112	    )

I implemented QueryParameters as a BaseModel, to inherit its field validation and serialization functionality. Validation on nullable fields is ensured using the Optional type (which can receive an instance of its contained type or None) defined in the typing module. When one of the fields is nulled, I do not apply the relative query criteria.

I also added an ExpenseRead type, which contains the data associated to an expense when queried from the database (i.e., all fields, including id, which was excluded in ExpenseAdd). As we will see, this type will not be directly used in querying operations, but it is good to have around for testing and documentation purposes.

# modules/schemas.py

115	class ExpenseRead(ExpenseBase):
116	    """Type for selection operations.
117	
118	    Attributes
119	    -----------------------
120	    id : int
121	        ID of the expense, primary key field.
122	    """
123	
124	    id: int = Field(description="ID of the expense, primary key
                                        field.")

Query

The stage is now set to explore the query functions in CRUDHandler. When creating the class, I decided to have two separate querying functions: one should return a list of expenses to browse through, while the other was supposed to summarize expenses (summing the amounts grouped by category and type, for end-of-the-month accounting and such).

Both of these function share the same query criteria, which turned out to be slightly involved to construct. To avoid repetitions, I wrote the private method CRUDHandler._build_query_conditions(), which accepts a QueryParameters object and returns an SQLAlchemy query condition.

# modules/crud_handler.py

118	    def _build_query_conditions(self, params: QueryParameters):
119	        if params.start is None:
120	            start_condition = True
121	        else:
122	            start_condition = Expense.date >= params.start
123	
124	        if params.end is None:
125	            end_condition = True
126	        else:
127	            end_condition = Expense.date <= params.end
128	
129	        if params.types is None:
130	            type_condition = True
131	        else:
132	            type_condition = Expense.type.in_(params.types)
133	
134	        if params.categories is None:
135	            cat_condition = True
136	        else:
137	            cat_condition = Expense.category.in_(params.categories)
138	
139	        return and_(
140	            start_condition, end_condition, type_condition,
                    cat_condition
141	        )

SQLAlchemy will accept conditions written as expressions in terms of the fields of the model (using the syntax <model>.<field>). These expressions can use SQL operators, some of which are defined in Python as well (e.g., <= and >=). SQLAlchemy also offers methods operating on model fields which can reproduce more complex SQL operators (such as IN, via in_()).

In the code above, we create for each field of QueryParameters a condition, which does not actually give a constraint unless the associated field is non-null. The obtained conditions are then joined with the and_() SQLAlchemy function, which reproduces the SQL AND operator.

The first query function is CRUDHandler.query(), which returns a list of Expense objects matching the passed filters, and orders them by their date and id fields. This second ordering is imposed to make sure expenses performed in the same day are always ordered in the same way.

# modules/crud_handler.py

143	    def query(self, params: QueryParameters) -> list[Expense]:
144	        """Return expenses matching specified filters.
145	
146	        Parameters
147	        -----------------------
148	        params : QueryParameters
149	            Parameters of the query.
150	
151	        Returns
152	        -----------------------
153	        list[Expense]
154	            List of expenses matching the criteria.
155	        """
156	        return self.session.scalars(
157	            select(Expense)
158	            .where(self._build_query_conditions(params))
159	            .order_by(Expense.date, Expense.id)
160	        ).all()

Here, we are using the Session.scalars().all() method chain, which performs a query and returns the results as a list of instances of the model datatype (other return types are also possible).

The argument of this method is an SQLAlchemy Select object, created using as constructor arguments the fields to return (or the whole model, to request all fields). The equivalent of SQL clauses such as WHERE, GROUP BY (see below) or ORDER BY can be invoked as methods of the Select object, receiving boolean-valued conditions and fields (as <model>.<field>) like in SQL.

Note that the method returns a list of Expense, rather than a list of ExpenseRead (which would require an explicit conversion). This will not be a problem, since this function is meant to only be called from the API, where the result will be immediately converted in JSON format. The created Expense instances are immediately discarded, and there is no possibility for unintended changes to the database.

Summarize

The second query method of CRUDHandler is summarize(), which receives the same criteria as query() but returns sums of the amount field, grouped by category and type. The returned result is a dictionary of dictionaries, where the outer and inner keys are category and type, respectively:

{
  "category 1": {"type 1": <amount 1>, "type 2": <amount 2>, ...},
  "category 2": {...},
  ...
}

The function implementation is

# modules/crud_handler.py
	
162	    def summarize(
163	        self, params: QueryParameters
164	    ) -> dict[str, dict[str, float]]:
165	        """Summarize expenses matching specified filters.
166	
167	        Parameters
168	        -----------------------
169	        params : QueryParameters
170	            Parameters of the query.
171	
172	        Returns
173	        -----------------------
174	        dict[dict[str, float]]
175	            Dictionary containing inner {type: amount} dictionaries,
176	            grouped by category.
177	        """
178	        sums = self.session.execute(
179	            select(
180	                Expense.category,
181	                Expense.type,
182	                (func.sum(Expense.amount)).label("sum"),
183	            )
184	            .where(self._build_query_conditions(params))
185	            .group_by(Expense.category, Expense.type)
186	            .order_by(Expense.category, Expense.type)
187	        ).all()
188	
189	        res = {}
190	
191	        # Creating outer dictionaries
192	        for s in sums:
193	            res[s.category] = {}
194	
195	        # Creating inner dictionaries
196	        for s in sums:
197	            res[s.category][s.type] = s.sum
198	
199	        return res

Here, I am using the sum() function, which reproduces the SQL aggregation function SUM(), and is contained in the SQLAlchemy func submodule, together with other functions applicable to model fields. I am also assigning an alias to the sum field via the label() method, which fields and expressions involving fields admit. The final step is setting up the result dictionary (with some repetition in key creation, which I deemed negligible).

Update Records

Included in the CRUD operations is the updating of existing records, which I chose to select based on their ID. The relative function should accept the new values for the fields to modify (without the need to give the ones which do not change). I encoded these requirements in the ExpenseUpdate strong type, which the update function will receive as argument (together with the ID of the expense to modify).

# modules/schemas.py

127	class ExpenseUpdate(BaseModel):
128	    """Container for data to update existing expenses with.
129	
130	    Attributes set to `None` will not be changed.
131	
132	    Attributes
133	    -----------------------
134	    date : Optional[datetime.date]
135	        Date of the expense. Default is `None`.
136	    type : Optional[str]
137	        Low-level group of the expense. Default is `None`.
138	    category : Optional[str]
139	        High-level group of the expense. Default is `None`.
140	    amount : Optional[float]
141	        Amount of the expense. Default is `None`.
142	    description : Optional[str]
143	        Description of the expense. Default is `None`.
144	    """
145	
146	    date: Optional[datetime.date] = Field(
147	        default=None,
148	        description="Date of the expense.",
149	    )
150	    type: Optional[str] = Field(
151	        default=None,
152	        description="Low-level group of the expense.",
153	    )
154	    category: Optional[str] = Field(
155	        default=None,
156	        description="High-level group of the expense.",
157	    )
158	    amount: Optional[float] = Field(
159	        default=None,
160	        description="Amount of the expense.",
161	    )
162	    description: Optional[str] = Field(
163	        default=None,
164	        description="Description of the expense.",
165	    )

While this type is in appearance similar to ExpenseAdd, the fields here are all Optional, allowing to only specify the ones to change, as promised.

Instances of this type will be passed to CRUDHandler.update(), which will update expenses selected based on their ID. Here, I modified the record with the selected ID through its mapped model instance. The function will also raise an error if the requested ID is not found in the database.

# modules/crud_handler.py

201	    def update(self, ID: int, data: ExpenseUpdate):
202	        """Update expense selected by ID.
203	
204	        Parameters
205	        -----------------------
206	        ID : int
207	            ID of the expense to update.
208	        data : ExpenseUpdate
209	            New data to patch the expense with. Unset fields will not 
                    change.
210	
211	        Raises
212	        -----------------------
213	        CRUDHandlerError
214	            If specified ID is not found.
215	        """
216	        exp = self.session.get(Expense, ID)
217	        if exp is None:
218	            raise CRUDHandlerError(f"ID {ID} not found")
219	
220	        for k, v in data.model_dump().items():
221	            if v is not None:
222	                setattr(exp, k, v)
223	        self.session.commit()

The Session.get() method allows to perform queries for fields (or the entire model, such as here) based on a primary key identifier (here, the ID).

The result will be None if the query produced no result, in which case the function raises a CRUDHandlerError. Otherwise, the returned Expense object is modified with the new values, unless they are set to None (in which case the old values are maintained). The changes at this point are not yet applied to the database, requiring a final call to Expense.commit().

Delete Records

The last among the CRUD operations is the removal of existing expenses, performed in CRUDHandler.remove() based on a list of IDs. “Missing ID” errors will have to be considered here as well.

I implemented the error checks with the additional requirement to rollback any changes performed in a request where an ID was not found (even if others were successfully found and removed). This allows the entire operation to be requested again once the errors are fixed.

# modules/crud_handler.py

225	    def remove(self, ids: list[int]):
226	        """Remove selected expenses from the DB.
227	
228	        Parameters
229	        -----------------------
230	        ids : list[int]
231	            IDs of the removed expenses.
232	
233	        Raises
234	        -----------------------
235	        CRUDHandlerError
236	            If a specified ID is not found.
237	        """
238	        for ID in ids:
239	            exp = self.session.get(Expense, ID)
240	            if exp is None:
241	                self.session.rollback()
242	                raise CRUDHandlerError(f"ID {ID} not found")
243	
244	            self.session.delete(exp)
245	
246	        self.session.commit()

Error management is a bit more involved here. Expense instances matching the passed IDs are extracted and marked for deletion (the last step is performed calling the Session.delete() method). If one of the IDs is not found, all changes are rolled back and an exception is raised. If no errors have been raised, changes are committed.

I also implemented an erase() method, which removes all expenses. This is especially useful for testing, and I thought it best to separate it from remove(), to avoid accidental erasures. As previously, Expense instances (in this case, all of them) will be queried and marked for deletion. The changes are then committed.

# modules/crud_handler.py

248	    def erase(self):
249	        """Remove all expenses from the DB and reset ID field."""
250	        for exp in self.query(QueryParameters()):
251	            self.session.delete(exp)
252	
253	        # Resetting primary key
254	        self.session.execute(text("ALTER SEQUENCE expenses_id_seq 
                                          RESTART"))
255	        self.session.commit()

The Session.execute() command is used here to execute an SQL command (wrapped in the SQLAlchemy text() converter): this SQL command resets the primary key, which would otherwise continue from the last assigned value before erasure.

What’s Next?

In this post, we understood how to implement CRUD (Create, Read, Update, Delete) operations on a database using SQLAlchemy methods and constructs.

In the next installment in this series, we will learn how to write an API which will access these functions and communicate with the user.


Author: Adriano Angelone

After obtaining his master in Physics at University of Pisa in 2013, he received his Ph. D. in Physics at Strasbourg University in 2017. He worked as a post-doctoral researcher at Strasbourg University, SISSA (Trieste) and Sorbonne University (Paris), before joining eXact-lab as Scientific Software Developer in 2023.

In eXact-lab, he works on the optimization of computational codes, and on the development of data engineering software.