Fractal is a platform for managing and analyzing the large amounts of data produced in high-resolution optical microscopy experiments as part of biomedical research. eXact lab is in charge of the development of this platform, coordinated by the Liberali (at the Friedrich Miescher Institute for Biomedical Research, in Basel) and Pelkmans (at the University of Zurich) laboratories. Fractal allows research groups to define workflows for image analysis, and execute them in an automated form at supercomputing centers.
The technologies that were used
In Fractal we adopt modern web technologies to ensure the platform’s resilience and stability. The architecture is completely written in the Python programming language, and uses the FastAPI web framework. The main components are:
- An Application Programming Interface (API), which grants uniformity in the interactions between client and server.
- A relational database, which ensures the proper management of users and metadata related to data analysis projects; this tool also provides enhanced security through the use of regular and incremental backups.
Fractal manages and analyzes big data, using large computational resources to complete analyses in the shortest possible time. For this reason, Fractal natively relates to SLURM, one of the most commonly used job managers for supercomputing. The jobs are themselves optimized to perform many operations in parallel, ensuring scalability in the case of large data sets. The core components of these jobs are based on the scientific Python ecosystem and modern computer vision algorithms.
The components used in Fractal are relevant to a variety of other applications, from streaming platforms to research in geoscience or genomics. The FastAPI web architecture and framework are among the de facto standards for web services in Python, used by a large community that actively contributes to their support and development. On the scientific side, instead, Fractal is based on technologies designed to optimize computational operations (e.g., the asynchronous computation tools offered by the dask library) and to be used within cloud infrastructures (e.g., the zarr format for efficient management of large multi-dimensional arrays). A relational database, which ensures the proper management of users and metadata related to data analysis projects; this tool also provides enhanced security through the use of regular and incremental backups.