UK Biobank is the world's most comprehensive source of biomedical data available for health research in the public interest. In 2023 it released data on nearly 3,000 circulating proteins from 54,000 participants. In 2024, Professor Claudia Langenberg, Director of the Precision Healthcare University Research Institute (PHURI) at Queen Mary, and her colleagues published a landmark study using UK Biobank proteomic data to identify disease risk. This research was one of a small number of studies to use this unique data.
Building on this pilot data release, UK Biobank announced today a project to measure up to 5,400 proteins in each of 600,000 samples, including those taken from half a million UK Biobank participants and 100,000 second samples taken from these volunteers up to 15 years later. The new unique dataset is ten times larger than that used in the pilot, and is being funded by a consortium of 14 leading biopharmaceutical companies, known as the UK Biobank Pharma Proteomics Project.
This new project will allow researchers to explore a first-of-its-kind database, detailing how changes to an individual's protein levels over mid-to-late life influence disease. The study will begin by analysing the first 300,000 samples, which will include initial samples from 250,000 UK Biobank volunteers and 50,000 second samples taken at follow-up assessments.
UK Biobank's proteomics dataset will allow researchers to:
- Examine proteomic and genetic data from half a million people simultaneously. UK Biobank released the whole genome sequencing of its half a million participants in November 2023. Adding proteomic data will allow researchers to combine these massive datasets, providing a more detailed picture of the biological processes involved in disease progression. This may in turn drive the development of personalised treatments.
- Examine how and why protein levels change over time. Half a million participants provided UK Biobank with a blood sample when they joined and 100,000 of them provided a second sample up to 15 years later. Researchers will be able to see how protein levels have changed over mid-to-late life, enhancing understanding of age-related changes in healthy individuals and shedding light on how diseases develop. This will further accelerate research into diagnostic and prognostic markers.
- Uniquely use proteomic data in combination with imaging data. Nearly 100,000 UK Biobank participants have undergone magnetic resonance imaging (MRI) of their brain, heart and body, providing researchers with detailed scans. Layering these different data types to investigate human health creates a truly extraordinary, detailed understanding of the disease mechanisms.
- Open avenues for developing AI models. Already, machine learning tools can predict future disease many years before diagnosis, with the potential to shape early interventions. The depth and breadth of the proteomic data held within UK Biobank may enable machine learning to accurately subtype diseases, which has the potential to inform what treatments should be given at the point of diagnosis.
Professor Langenberg said: "Adding proteomic data for the full UK Biobank cohort will be an absolute game-changer for prediction of disease onset and prognosis, particularly for the many neglected diseases for which good prospective data are lacking. These include debilitating and life threating diseases, such as polycystic ovary syndrome and motor neurone disease. Just imagine if we could detect these and many other conditions much earlier than is currently possible."
Professor Sir Rory Collins, Principal Investigator and Chief Executive of UK Biobank, said: "For the first time at this scale, researchers will be able to detect the exact causes of diseases by comparing how protein levels change over mid-to-late life in a large group of people. Proteomic data has already paved the way for better cancer, autoimmune and dementia diagnostics, and this truly exciting study of proteins will significantly speed up drug discovery, leading to major improvements in public health and care everywhere."
It will take about a year to measure the protein levels in 300,000 participant samples. The proteomic data will be made available to UK Biobank-approved researchers in staggered releases from 2026, with the full dataset expected to be added to the UK Biobank Research Analysis Platform by 2027. During this time, additional funding will be sought to analyse samples from all remaining UK Biobank volunteers (an additional 250,000 participants, including second samples from a further 50,000).