New data standards would speed battery development.
Back in the 1990s, researchers undertook the Human Genome Project, a 13-year journey designed to enable a new era of innovation in medicine. By assembling and sharing vast amounts of data to uncover the secrets behind the genetic determinants of disease, this project transformed the medical industry and led to innumerable breakthroughs. Now, researchers are extending that model of innovation to battery science with an eye on impact at a global scale.
Scientists from an international consortium led by researchers at the U.S. Department of Energy's Argonne and Idaho national laboratories have recently proposed a comprehensive new data science paradigm called the Battery Data Genome. It is an ambitious undertaking to develop uniform data acquisition and data sharing practices across the wide-ranging battery community. These innovative practices will create an extensive database network to enable energy storage breakthroughs using artificial intelligence (AI).
"This is a call to action. We're trying to energize and organize the battery community to contribute their data, whenever possible to as many researchers as possible, to enable powerful data science methods to catalyze breakthroughs." — Noah Paulson, Argonne battery scientist
"The electrochemical science that is urgently needed for a zero-carbon economy requires state-of-the-art data science," said Argonne battery scientist Sue Babinec. "Tackling the extremely complex technical questions that battery scientists face requires huge amounts of data to generate AI and machine learning algorithms."
"Although there are some specialized data science projects for batteries, like the Electrolyte Genome Project, the undertaking to create a Battery Data Genome devoted to all aspects of the battery and that unifies work done across institutions and scales is unprecedented," said Argonne distinguished fellow and Joint Center for Energy Storage Research director George Crabtree.
According to Crabtree, the Battery Data Genome will collect and house data from every step of the battery lifecycle, from discovery to development to manufacturing and all manner of deployments. Having universal standards for data management for each segment of the battery community is required for data creation to unlock the power of AI algorithms designed to identify everything from new candidate electrode materials to improved battery pack construction to cell lifetimes.
"This is a call to action," said Argonne battery scientist Noah Paulson. "We're trying to energize and organize the battery community to contribute their data, whenever possible to as many researchers as possible, to enable powerful data science methods to catalyze breakthroughs."
According to Paulson, scientists are interested in many different characteristics and qualities when measuring a battery's performance. Because of this, the datasets that are collected by different groups, even those looking at the same battery in the same setup, will not be identical. "We have to find the basic set of information that should be associated with each set of data, so that we no longer have to spend time cleaning the data to fit our models," he said.
"For batteries, there are many common types of data, but there's no uniform way of approaching them," added Argonne computational scientist Logan Ward. "When data come in many different formats, don't include how they're collected, and aren't frequently shared among different groups, it becomes very difficult to do the kind of large-scale AI analysis and predictions necessary to speed the development and deployment of new batteries."
Having data that are consistent and accessible means that they will be formatted in a specific way with uniform standards for metadata — which identify how the data are collected. "For instance, metadata might include the ambient temperature or even the resistance of contacts to your electrodes," Babinec said.
To Paulson, enhanced collaboration among the entire spectrum of battery researchers, from those looking at individual molecules to those designing and testing battery packs, will be necessary to create the new standards.
Transitioning different groups of researchers studying different stages of a battery's development to create a universal set of data that can be widely accessed, understood and used represents a significant challenge, Babinec said. "It's as if part of the community's data is written in Spanish and part is written in German; you need to have a common language of science."
To attract as many participants as possible, the Battery Data Genome offers many options for data sharing. "It's important to recognize that not all data needs to be shared openly for success; there are many different sharing scenarios that could provide individual benefits amongst the many groups in the complex ecosystem," Babinec said.
This would potentially make participation in the Battery Data Genome more attractive to industry partners, who could take advantage of the data produced by academic or government partners without having to necessarily contribute their own. "It's not unlike blood types," Babinec said. "Some groups would be universal data donors, other groups would be universal data recipients, and overall the community would benefit."
Once scientists populate the Battery Data Genome with data, they will have to test it out. To do so, they will use "challenge problems" to validate the best AI algorithms with the data in the Battery Data Genome to solve real-world questions. "We might want to find out what happens to a particular kind of battery run for a certain number of cycles at a certain temperature, but to do that predictively using AI," Babinec said. "Having a strong repository of standardized data is the first step."
Having a standardized and easily accessible, extensive data set may spur new questions for the battery community, Crabtree said. "There are a lot of 'unknown unknowns' in batteries," he said. "With access to data that all conform to a universal set of standards, guided by machine learning and artificial intelligence, we may find new pathways for innovation that to date we have not yet considered," he said.
Argonne already provides open software for clean-up of existing data files with the "battery-data-toolkit" which is located at https://github.com/materials-data-facility/battery-data-toolkit. A release of complete files for establishing cycle-life of 300 lithium-ion batteries with six different cathode chemistries will be coming later in October.
A paper announcing the Battery Data Genome will appear in the October 19 issue of Joule.
The Joint Center for Energy Storage Research (JCESR), a DOE Energy Innovation Hub, is a major partnership that integrates researchers from many disciplines to overcome critical scientific and technical barriers and create new breakthrough energy storage technology. Led by the U.S. Department of Energy's Argonne National Laboratory, partners include national leaders in science and engineering from academia, the private sector, and national laboratories. Their combined expertise spans the full range of the technology-development pipeline from basic research to prototype development to product engineering to market delivery.
Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation's first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America's scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy's Office of Science.