Researchers today (Nov. 8, 2024) are releasing the flagship dataset from an ambitious study of biomarkers and environmental factors that might influence the development of type 2 diabetes. Because the study participants include people with no diabetes and others with various stages of the condition, the early findings hint at a tapestry of information distinct from previous research.
For instance, data from a customized environmental sensor in participants' homes show a clear association between disease state and exposure to tiny particulates of pollution. The collected data also include survey responses, depression scales, eye-imaging scans and traditional measures of glucose and other biologic variables.
All of these data are intended to be mined by artificial intelligence for novel insights about risks, preventive measures, and pathways between disease and health.
"We see data supporting heterogeneity among type 2 diabetes patients — that people aren't all dealing with the same thing. And because we're getting such large, granular datasets, researchers will be able to explore this deeply," said Dr. Cecilia Lee , a professor of ophthalmology at the University of Washington School of Medicine.
She expressed excitement at the quality of the collected data, which represent 1,067 people, just 25% of the study's total expected enrollees.
Lee is program director of AI-READI (Artificial Intelligence Ready and Equitable Atlas for Diabetes Insights). The National Institutes of Health-supported initiative aims to collect and share AI-ready data for global scientists to analyze for new clues about health and disease.
The initial data release is highlighted in a paper published Nov. 8 in the journal Nature Metabolism. The authors restated their aim to gather health information from a more racially and ethnically diverse population than has been measured previously, and to make the resulting data ready, technically and ethically, for AI mining.
"This process of discovery has been invigorating," said Dr. Aaron Lee , also a UW Medicine professor of ophthalmology and the project's principal investigator. "We're a consortium of seven institutions and multidisciplinary teams that had not worked together before. But we have shared goals of drawing on unbiased data and protecting the security of that data as we make it accessible to colleagues everywhere."
At study sites in Seattle, San Diego, and Birmingham, Alabama, recruiters are collectively enrolling 4,000 participants, with inclusion criteria promoting balance:
- race/ethnicity (1,000 each – white, Black, Hispanic and Asian)
- disease severity (1,000 each – no diabetes, prediabetes, medication/non-insulin-controlled and insulin-controlled type 2 diabetes)
- sex (equal male/female split)
"Conventionally scientists are examining pathogenesis — how people become diseased — and risk factors," Aaron Lee said. "We want our datasets to also be studied for salutogenesis , or factors that contribute to health. So if your diabetes gets better, what factors might be contributing to that? We expect that the flagship dataset will lead to novel discoveries about type 2 diabetes in both of these ways."
By collecting more deeply characterizing data from a lot of people, he added, the researchers hope to create pseudo health histories of how a person might progress from disease to full health and from full health to disease.
Hosted on a custom online platform, the data are produced in two sets: a controlled-access set requiring a usage agreement, and a registered, publicly available version stripped of HIPAA-protected information.
The pilot data release (summer 2024) involving 204 participants has been downloaded by more than 110 research organizations worldwide. Researchers must verify their identity and agree to ethical-usage terms. (Learn more about accessing the data at aireadi.org .)
The AI-READI Consortium comprises the University of Washington School of Medicine, University of Alabama at Birmingham, University of California San Diego, California Medical Innovations Institute, Johns Hopkins University, Native Biodata Consortium, Stanford University and Oregon Health & Science University.
The project is based at the Angie Karalis Johnson Retina Center at UW Medicine in Seattle. Cecilia Lee holds the Klorfine Family Endowed Chair. Aaron Lee holds the Dan and Irene Hunter Endowed Professorship.
This work was supported by the NIH (grants OT2OD032644 and P30 DK035816). The authors' conflict-of-interest statements are in the published paper, which will be provided upon request.