New Cryptography Framework For Secure Genomic Studies

Developed from EPFL research, in collaboration with MIT and Yale, the combination of secure computation and distributed algorithms opens a new era for data collaborations in medical research.

Advances in generative artificial intelligence and machine learning, trained on large-scale datasets across multiple institutions, have the potential to revolutionize medicine. However, data is hard to gather. It is siloed in individual hospitals, medical practices, and clinics around the world. Privacy risks stemming from disclosing medical data are also a serious concern, so existing data-sharing regulations have largely limited the scope of data collaborations for medical research.

Cryptographic tools for secure computation do exist but they are either impractical or don't implement current state-of-the-art methods. Now, an approach developed by EPFL has been demonstrated successfully at scale and is being rolled out across Europe.

Secure federated genome-wide association studies or SF-GWAS is a combination of secure computation frameworks and distributed algorithms that empowers efficient and accurate studies on private data held by multiple entities while ensuring data confidentiality. A study on five datasets, including on a UK Biobank cohort of 410,000 individuals, has showcased an order-of-magnitude improvement in runtime compared to previous methods.

"In many cases it's not possible to centralize data for practical or legal reasons or just because people aren't willing to share it. So, the goal is to extract information without sharing the data," said Jean-Pierre Hubaux, the Academic Director at EPFL's Center for Digital Trust (C4DT), affiliated with the School of Computer and Communication Sciences.

"We developed a prototype several years ago but what was missing was the demonstration that it works at scale with real world size datasets. This has now been done in collaboration with MIT and Yale with our latest research showing that it is possible to extract information from datasets that remain geographically distributed, with no significant precision loss in terms of results; This opens a new era in terms of data collaborations," he continued.

SF-GWAS combines two key concepts. First, it takes a federated approach to secure computation, meaning that each dataset is kept at the respective source site. This minimizes computational costs by avoiding large data transfers between sites and allows the use of efficient cryptographic operations that protect the partial computational output generated at each site.

Second, it introduces an efficient algorithmic design to support the federated execution of various end-to-end GWAS pipelines.

"It sounds counterintuitive, but our approach shares data without sharing," explained Hubaux. "It leverages on the existence of the datasets without having to transfer it and is essentially an additional value to the data, an additional motivation to work together without losing control."

SF-GWAS has already been installed in Switzerland's five university hospitals; It is currently being rolled out in several Italian hospitals and for European cancer networks by Tune Insight, the EPFL spin-off leading this work. The company is also in talks with medical institutions in other countries.

In addition to unlocking medical research at scale to define and optimize public healthcare policy, which is just not possible in a world of silos, Hubaux believes that SF-GWAS will have a valuable side benefit. Currently, datasets are de facto distributed worldwide, sitting on hard discs and tapes here and there, because it has traditionally been so complicated to transfer data. The recording of medical data is also applied differently in different places. Hubaux calls this "prehistoric" and says that as a result, datasets are very underutilized.

"We are setting up a value system to make sure that future data is going to be interoperable, that it is recorded in the same way place to place, otherwise it will be junk in, junk out. It's costly and the transition will take time but we have developed the tools to facilitate it and there is an evolution underway," Hubaux said.

"The willingness to work at scale is a change of culture and hopefully this is a virtuous circle: people feel encouraged to be more rigorous in terms of the way they store and structure their data in order to guarantee interoperability because if they don't, their institution may be excluded from the rest of the community. This is really a side benefit - better overall quality of health and medical data."

Link to the full paper: https://rdcu.be/ea16o

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.