Noting that recent advances in artificial intelligence and the existence of large-scale experimental data about human biology have reached a critical mass, a team of researchers from Stanford University, Genentech, and the Chan-Zuckerberg Initiative says that science has an "unprecedented opportunity" to use artificial intelligence (AI) to create the world's first virtual human cell. Such a cell would be able to represent and simulate the precise behavior of human biomolecules, cells, and, eventually, tissues and organs.
"Modeling human cells can be considered the holy grail of biology," said Emma Lundberg, associate professor of bioengineering and of pathology in the schools of Engineering and Medicine at Stanford and a senior author of a new article in the journal Cell proposing a concerted, global effort to create the world's first AI virtual cell. "AI offers the ability to learn directly from data and to move beyond assumptions and hunches to discover the emergent properties of complex biological systems."
Lundberg's fellow senior authors include two Stanford colleagues, Stephen Quake, a professor of bioengineering and science director at the Chan-Zuckerberg Initiative, and Jure Leskovec, a professor of computer science in the School of Engineering, as well as Theofanis Karaletsos, head of artificial intelligence for science at the Chan Zuckerberg Initiative, and Aviv Regev executive vice president of research at Genentech.
Remarkable promise
Such a synthetic cell model would both allow a deeper understanding of the complex interplay of chemical, electrical, mechanical, and other forces and processes that make healthy human cells work, and also reveal the root causes of disease that lead to cell dysfunction or death.
Perhaps more intriguingly, an AI virtual cell would also allow scientists to experiment in silico instead of in vivo - on a computer rather than on living cells and organisms. This ability would expand human understanding of human biology and speed the search for new therapies, pharmaceuticals, and perhaps cures to disease.
Cancer biologists might model how certain mutations turn healthy cells malignant.
Microbiologists might someday predict the effects of viruses on infected cells and perhaps even host organisms. Physicians might one day test treatments on "digital twins" of their patients, accelerating a long-promised age of faster, more cost-effective, and safer personalized medicine.
To be considered a success, however, the authors say an AI virtual cell would need to accomplish three outcomes. First, it would need to empower researchers to create universal representations across species and cell types. It would also have to accurately predict cellular function, behavior, and dynamics and comprehend cellular mechanisms. And, last but not least, an AI virtual cell would allow experiments on computers to test hypotheses and guide data collection to expand the virtual cell's abilities at speed and at costs far below those of today.
Global collaboration
In what the authors call a "trifecta" for science, AI has ushered in an era of tools that are predictive, generative, and query-able, and yet the massive scale of raw biological data that will be needed to create the virtual cell is undeniable. By comparison, the authors point to the storehouse of DNA sequencing data compiled by the National Institutes of Health called the Short Read Archive which now contains more than 14 petabytes of data - a thousand times larger than the dataset used to train ChatGPT.
Achieving the AI virtual cell will not be easy. It will require a concerted, global, open-science collaboration on an unprecedented scale across fields ranging from genetics and proteomics to medical imaging, and a close partnership among global stakeholders in academia, industry, and non-profits. At the same time, the authors are careful to note that any work toward the AI virtual cell should only be undertaken with the assumption that resulting models will be made available to the entire scientific community without restriction.
"This is a mammoth project, comparable to the genome project, requiring collaboration across disciplines, industries, and nations, and we understand that fully functional models might not be available for a decade or more," Lundberg asserted. "But, with today's rapidly expanding AI capabilities and our massive and growing datasets, the time is ripe for science to unite and begin the work of revolutionizing the way we understand and model biology."