New miniature laboratories are ensuring that artificial intelligence (AI) doesn't make mistakes. They provide a controlled test environment where algorithms and AI models can be checked before being put to work under real-life conditions. The aim is for AI to work reliably.
Anyone who develops an AI solution sometimes goes on a journey into the unknown. At least at the beginning, researchers and designers do not always know whether their algorithms and AI models will work as expected or whether the AI will ultimately make mistakes. Sometimes, AI applications that work well in theory perform poorly under real-life conditions. In order to gain the trust of users, however, an AI should work reliably and correctly (see ETH Globe Magazine, 18.03.2025 ). This applies just as much to popular chatbots as it does to AI tools in research.
Any new AI tool has to be tested thoroughly before it is deployed in the real world. However, testing in the real world can be an expensive, or even risky endeavour. For this reason, researchers often test their algorithms in computer simulations of reality. However, since simulations are approximations of reality, testing AI solutions in this way can lead researchers to overestimate an AI's performance. Writing in Nature Machine Intelligence, ETH mathematician Juan Gamella now presents a new approach that researchers can use to check how reliably and correctly their algorithms and AI models work. An AI model is based on certain assumptions and is trained to learn from data and perform given tasks intelligently. An algorithm comprises the mathematical rules that the AI model follows to process a task.
Testing AI instead of overestimating
Juan Gamella has built special miniature laboratories ("mini-labs") that can be used as test beds for new AI algorithms. "The mini-labs provide a flexible test environment that delivers real measurement data. They're a bit like a playground for algorithms, where researchers can test their AI beyond simulated data in a controlled and safe environment," says Gamella. The mini-labs are built around well-understood physics, so that researchers can use this knowledge to check whether their algorithms arrive at the correct solution for a variety of problems. If an AI fails the test, researchers can make targeted improvements to the underlying mathematical assumptions and algorithms early on in the development process.
Gamella's first mini-labs are based on two physical systems that exhibit essential properties that many AI tools have to deal with under real-world conditions. How the mini-labs are used depends on the issue being examined and what the algorithm is intended to do. For example, his first mini-lab contains a dynamic system such as wind that is constantly changing and reacting to external influences. It can be used to test AI tools for control problems. His second mini-lab, which obeys well-understood laws of physicsfor light, can be used to test an AI that aims to automatically learn such laws from data and thus assists scientists in making new discoveries.

"I want to develop tools that help scientists solve research questions."![]()
Juan Gamella
The mini-labs are tangible devices, about the size of a desktop computer, that can be operated by remote control. They are reminiscent of the historical demonstration experiments conducted by researchers from the 16th century onwards to present, discuss and improve their theories and findings in scientific societies. Gamella compares the role of the mini-labs in the design of AI algorithms to that of a wind tunnel in aircraft construction: when a new aircraft is being developed, most of the design work is initially carried out using computer simulations because it is more efficient and cost-effective. Once the engineers have agreed on their designs, they build miniature models and test them in a wind tunnel. Only then do they build a full-sized aircraft and test it on real flights.
An intermediate step between simulation and reality
"Like the wind tunnel for aircraft, the mini-labs serve as a sanity check to make sure everything works early on as we move from simulation to reality," says Gamella. He views testing AI algorithms in a controlled environment as a crucial, intermediate step to ensure an AI works in complex real-world scenarios. The mini-labs provide this for certain types of AI, particularly those designed to directly interact with the physical world.
The mini-labs help researchers study the problem of the transition from simulation to reality by providing a test bed where they can carry out as many experiments as they need. This transitional problem is also relevant at the intersection between robotics and AI, where AI algorithms are often trained to solve tasks in a simulated environment first, and only then in the real world. This increases its reliability.
Gamella himself started out with a Bachelor's Degree in Mathematics before pursuing a Master's Degree in Robotics at ETH. As a doctoral student, he returned to mathematics and AI research. He has kept his flair for physics and technology: "I want to develop tools that help scientists solve research questions." The application for his mini-labs is not limited to engineering. Together with a colleague from the Charité University Hospital in Berlin, he attempted to design a mini-lab to test AI algorithms in cell biology and synthetic biology. However, the costs were too high. By contrast, his second mini-lab, a light tunnel, is already being used as a test environment in industrial production - for an optical problem. The mini-labs have also helped to test various new methods for how large language models (LLMs) can make external page more accurate predictions in the real world.
Causal AI - the silver bullet for correct AI
Writing in Nature Machine Intelligence, Gamella has adopted the silver bullet approach to proving the suitability of his mini-labs - and ultimately demonstrates that they are useful even for questions of causal AI. Causality research and causal AI are a key area of statistics and theoretical computer science that is fundamental to AI models: for AI models to function reliably and correctly, they should understand causal relationships.

"The causal chambers are a valuable addition to causality research. New algorithms can be validated in an unprecedented way."![]()
Peter Bühlmann
However, AI models often do not reflect the causal relationships of the world, but instead make predictions based on statistical correlations (see Interview with ETH computer science professor Thomas Hofmann ). Scientifically speaking, causality is a fundamental concept that describes the relationships between cause and effect. Causal AI refers to AI models that recognise cause-and-effect relationships. The results of causal AI are more precise and transparent. That is why causal AI is important for fields such as medicine, economics and climate research.
New statistical methods are needed to develop causal AI, since causal relationships are sometimes influenced by special circumstances and coincidences. In addition, they cannot be easily separated from one another in complex contexts. Gamella has worked on research in partnership with ETH mathematics professors Peter Bühlmann and Jonas Peters. Both have developed important approaches for identifying causal relationships under changing conditions and distinguishing them from confounding influences or random noise.
"However, these methods are generally difficult to test in the real world," says Gamella. " To do so, we need data from systems where the cause-effect relationships are already known to check whether our algorithms can accurately learn them. This data is difficult to find." For the publication, the three ETH researchers therefore tested causal AI algorithms in the mini-labs built by Gamella. He himself also refers to his mini-labs as "causal chambers". First, they tested whether the algorithms learned the correct causal model for each mini-lab, i.e. for wind and light. They also observed how well the algorithms identified which factors influence each other and how they perform under unusual conditions or when sudden changes occur. Peter Bühlmann, Gamella's doctoral supervisor, is full of praise: "The causal chambers are a valuable addition to causality research. New algorithms can be validated in an unprecedented way."
A safe and playful way to learn
Gamella is pleased by the unexpected benefits the causal chambers provide for teaching. "Since the mini-labs provide a safe playground for algorithms, they are also a great playground for students," he says. Lecturers in AI, statistics and other engineering fields can use them to allow their students to directly apply what they have learned in a practical environment. Lecturers from around the world have already expressed their interest, and Gamella is now setting up pilot studies at ETH Zurich and the University of Liège.
References
Gamella, J.L., Peters, J. & Bühlmann, P. Causal chambers as a real-world physical testbed for AI methodology. Nature Machine Intelligence 7, 107-118 (2025). DOI: external page 10.1038/s42256-024-00964-x .