EMBL Grenoble's Marquez Team has developed an AI-based training method that automates crystal screening and identification for macromolecular crystallography studies

The way structural biologists investigate the mysteries of protein folding has changed greatly over the past few decades, thanks to technological developments. Several innovations in the field of macromolecular crystallography developed at EMBL Grenoble have enabled tedious manual processes to be almost fully automated, thereby improving sample quality and accelerating data collection. However, some steps of the crystallisation process still require manual intervention, one of them being crystal screening and identification.
The Marquez Team, who operate the High-throughput Crystallisation and Fragment Screening Facility (HTX lab) at EMBL Grenoble, addressed this issue by developing an AI-based training method called AXIS (AI-based Crystal Identification System). They also designed its local implementation, CRIMS-AXIS, which they then integrated into the Crystallographic Information Management System (CRIMS) - a bespoke platform for managing all the information and data associated with crystallisation experiments.
Recently described in the International Union of Crystallography Journal , this method is based on a lab-in-the-loop approach combining large computer vision models with expert input captured via the CRIMS software. It provides a simple and cost-effective AI-assisted automated screening system that speeds up structure-based drug design.
What is macromolecular crystallography?
Structural biology aims to gain insight into biological processes by determining the three-dimensional structure of macromolecules at the atomic level. One method of achieving this is macromolecular crystallography, which involves shooting high-intensity X-rays at regularly structured arrays of molecules, packed together in a crystal.
In order to carry out these experiments, large numbers of crystals have to be produced and prepared for data collection. This typically happens at dedicated facilities like the HTX lab in EMBL Grenoble. Technicians test many different solutions on protein samples to find the one that will produce a crystal structure.
Lab-in-the-loop
Over the past few decades, the HTX lab has collaborated with technology-focused teams at EMBL Grenoble to automate manual processes . Thanks to robotics and software development, most of the pipelines have already been automated, and users can perform the experiments remotely through the CRIMS software. However, the crystal screening and identification steps were still done manually by users through CRIMS.
"On average, 13,000 images are generated for each screening, of which about 5% show crystals," explained Aurélien Personnaz, the ARISE postdoctoral fellow in the Marquez Team who developed AXIS. "The idea was to save researchers from the tedious task of checking for crystals in the images, while also ensuring better quality control."
Personnaz, who has a background in computer science and experience in web development and applied AI, developed the new training method to predict the probability of crystal formation using visible and ultraviolet images. He started from a large Vision Transformer model - a type of model adapted for computer vision tasks and sharing architectural principles with Large Language Models -, that he pre-trained on millions of images gathered from the intranet. He then trained it first using a relatively large generic crystallography dataset, before specialising it on local data from the CRIMS system.
CRIMS-AXIS was thus born, but it still needed polishing. Personnaz adopted a 'lab-in-the-loop' approach for the training method, comparing machine learning predictions with manual inputs from hundreds of expert scientists carrying out experiments at the HTX lab. "This is like getting the best of both worlds, AI-based predictions corrected by expert scientists that in turn help re-train and improve the system," commented team leader and senior scientist Jose Marquez, who is leading the HTX lab. "With two rounds of lab-in-the-loop training, the accuracy of AXIS predictions improved enormously."
What is 'lab-in-the-loop'?
Lab-in-the-loop aims at improving experimental processes with machine learning and generative AI methods, using iteration loops and user feedback to continually correct the model.
By focusing on discrepancies between machine learning predictions and human scores, this method accelerates the learning process. However, humans can also make mistakes. For each conflicting score, the HTX experts had to determine whether the machine or the scientist was correct. In each iteration, Personnaz added an extended training dataset together with the curated data to retrain the model. "The first iteration showed obvious errors, but at the second iteration, it made fewer 'silly' mistakes," commented Personnaz. "This shows that CRIMS-AXIS made good progress, because increasingly, what remains are cases that are impossible to solve." These cases are situations in which experts could not tell if an image showed very small crystalline-like material or amorphous precipitate.
Fully integrated into the CRIMS software, CRIMS-AXIS identifies crystals, as well as needles or other crystallisation forms. The model has received positive feedback from the users. "AXIS removes critical bottlenecks, particularly in the context of extensive crystallisation screens, unlocking the potential for higher levels of automation that are key in both fundamental and translational research," explained Sihyun Sung, Staff Scientist in the Marquez Team and user of CRIMS-AXIS.
This work benefited from support from the European Commission via the Fragment-Screen project coordinated by Instruct-ERIC, and can be easily integrated by other labs, as the machine learning models have been deposited in a central repository and are available for the scientific community to use.
Next steps
Personnaz is now working with EMBL Grenoble colleagues on improving CRIMS-AXIS and upgrading their automated pipelines.
On the machine learning front, he is working with Alana de Sousa, an astrophysicist specialising in AI studies, who is currently doing a traineeship in the Marquez Team. They are attempting to apply 'self-supervised learning' for CRIMS-AXIS, leveraging the large number of unlabelled crystallography images produced over many years using the HTX platform. The aim is to try pre-training the model with only unlabelled crystallography images, therefore restricting the diversity of training images. This would let the model 'learn to understand' crystallography images and potentially achieve better results for crystal identification. The researchers also plan to test whether it can be used for other tasks like multi-class classification, crystal detection, or segmentation.
For moving towards fully automated crystallisation pipelines, Personnaz is collaborating with software engineer Jeremy Sinoir from the Papp Team to integrate automated crystal harvesting in CRIMS. Currently, HTX operators have to select in the software which crystals need to be harvested and prepared for diffraction data collection experiments, and how. The 'automated harvesting' Personnaz and Sinoir are developing would be integrated in CrystalDirect Harvester 4, the latest version of the harvesting machine, soon to be used on the HTX platform. The Marquez Team is also extending the lab-in-the-loop approach to other steps of the crystallography process.
This project has received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 945405