To advance modern medicine, EPFL researchers are developing AI-based diagnostic tools. Their goal is to predict the best treatment a patient should receive.
Charlotte Bunne, head of EPFL's Artificial Intelligence in Molecular Medicine Group, is developing AI algorithms to better understand the incredibly complex and high-dimensional data that represent the hundreds of tissue layers and protein markers in an individual cell. EPFL magazine Dimensions spoke to Charlotte Brunne about her work at the cutting-edge of AI in medicine and biology.
Could you describe the focus of your research?
We are developing diagnostic tools for clinics that are driven by AI technologies. This includes forecasting the best treatment that a patient should receive, trying to understand the state of disease that a patient is in, and deciphering important biomarkers or potential drug targets that we should investigate further. Importantly, the molecular profile, and thus ultimately the associated disease phenotype of every patient, is unique to each person. Tailoring therapies to an individual's molecular profile requires both measurements that capture the cellular and molecular factors that influence treatment response as well as powerful AI technologies that robustly predict those from the correspondingly large and high-dimensional biomedical datasets originating from various experiments.
And while we see these incredible achievements of AI in vision and language, biological data is very different: Measurements are indirect, obscured, multimodal, and represent only snapshots of an inherently dynamic system that governs underlying biological processes. We can't just apply AI technologies that are developed for language to the field of biology, we need to tailor architectures and learning algorithms to the intricacies of biological data and systems.
While these large neural network models we develop are often black boxes in terms of their predictions, we need to design them in such a way that we at least understand which biological factors have contributed to a prediction. This understanding is crucial for biomarker and drug target discovery, as it highlights specific biological mechanisms and pathways linked to disease, revealing new therapeutic opportunities.
Tell me a bit about your background - how did you begin working in this really cutting-edge area, how did it pique your interest?
I started early! At 14 I was part of a high-school scholarship program at the German Cancer Research Center and I was fascinated working on synthetic biology, a field that combines engineering, computer science and biotechnology. Since then, I have been convinced that only truly interdisciplinary approaches will allow us to reach our goals. Now, in my professorship I am jointly affiliated with EPFL's School of Life Sciences and School of Computer Sciences.
As a high school student, we modified simple bacterial cells to have a new function: that allowed us to use them as small little machines in a product. Now, I'm interested in how we can engineer human cells so that they have diagnostic properties, how we can forecast their behavior to therapies or how we can reprogram them from a disease into a healthy state. So, even though the goals, tools and in particular the level of complexity could not be more different from the work I did when I was 14, the essence remains the same.
Clearly this research field is an important driver towards personalized medicine. How quickly is it evolving, and has it really come into its own in the last few years with the advances in AI?
I'm a young researcher, so I've joined a revolution that has been happening for some time. The field has transformed incredibly quickly in recent times because of the way we can now generate high-throughput biological data in an unprecedented resolution. Organizing massive collections of biomedical datasets is the foundation for training large neural networks. For example, a lot of the success of the latest Nobel Prize for Chemistry, awarded in parts to scientists that developed the protein-structure prediction tool AlphaFold, is thanks to the Protein Data Bank, a large collection of protein structures freely available to anyone.
Our research happens one level up, where we are trying to simulate biological function and the behavior of cells and tissues. We base our AI models on data that measures hundreds of features in individual cells and provides insights into the subcellular location, presence and abundance of individual proteins and molecules within a cell. We are increasingly collecting this very rich data into databases, so progress is due to a combination of the availability of more samples and getting very rich and very high-resolved data of human cells.
Often, however, we still work in low-data regimes and lack comprehensive datasets that, for example, capture dynamic cellular processes over time and across physical scales: in particular, paired data linking molecular changes to tissue-level behaviors is scarce, which means we need to be creative when developing AI systems to overcome these limitations.
Fully grasping the complexity of biological systems - which involve countless molecular interactions that organize to overall systems-level dynamics on time scales ranging from picoseconds to processes that take place over years - is a monumental task.
You mentioned data collection, and the databases that have been a fundamental basis to the work you are now doing. Clearly there are issues around privacy and how patient data can be used to train machine learning algorithms. How does this work and how is Switzerland placed in the global context?
Of course, patient data requires the highest sensitivity. Such data is kept in secure computing environments and data protection regulations set stringent requirements for handling and processing such data. What is somewhat unique in Switzerland is the coordination of efforts to develop interoperable data infrastructures that enable the nationwide accessibility and exchange of health-related data. This sets the foundation for developing AI algorithms that use growing databases of diverse and representative patient data. Our work profits from these tremendous efforts and ecosystems that have been established in Switzerland over the past years.
Another cornerstone of our research is the close exchange with clinicians and biologists. For us, this means that we are developing our AI solutions in close collaboration and can adapt them such that the diagnostic tools we build integrate seamlessly into clinical routines and processes. At the same time, these close collaborations with clinicians and biologists allow us to influence and steer future data generation in areas where data is under sampled, or to prioritize measurements of data modalities that offer deeper insights into the molecular makeup of cells and tissues. We expect that such AI-guided data collection will significantly improve the capabilities of the AI models we build.
You are also involved in a global community that aims to develop AI-powered Virtual Cells. What are these and how will they take current research further?
There are countless ways of measuring biology across many different physical scales, from molecular interactions to tissue architecture. The question we aim to answer is: How can we integrate all those measurements to get the full picture and a comprehensive understanding of cell behavior and function? Specifically, can we predict how a cell's molecular state will change upon an external perturbation such as a drug, an environmental influence, a disease, a treatment? Essentially, we want to understand why a cell adopts a particular state rather than another.
With advances in measurement techniques and increasingly powerful AI architectures, we are now beginning to have the tools to tackle such challenges. Some of these AI models are built on single-cell measurement data, while others focus on decoding the language of DNA or predicting protein folding. The vision is to create a multimodal, multi-scale foundation model - an AI-powered Virtual Cell - that integrates all those efforts and measurements and represents and simulates the behavior of molecules, cells, and tissues across a range of states and conditions. An AI Virtual Cell serves as a learned, universal simulator capable of modeling cellular systems under diverse scenarios, including differentiation, disease states, stochastic fluctuations, and environmental influences.
This is a massive, collaborative effort involving a global research community. Many groups are working on different components of this puzzle, and our challenge and opportunity lie in integrating these contributions into a cohesive vision that will push the boundaries of what's possible in biomedical research.
If you had a crystal ball, where would you see AI in biomedicine in a decade? What will you be doing in ten years' time?
There are some easier tasks in biology which we might have solved and for which we are able to make accurate predictions. Success stories such as AlphaFold demonstrate that we can solve specific isolated problems, and I expect more breakthroughs of that kind in the next decade. However, fully grasping the complexity of biological systems - which involve countless molecular interactions that organize to overall systems-level dynamics on time scales ranging from picoseconds to processes that take place over years - is a monumental task. I believe we will have countless problems to solve and questions to answer for many, many years to come.