The VEIL.AI is a system developed at the University of Helsinki to meet the needs of demanding projects that utilise health data. It enables individual-level, sensitive data to be used in e.g. machine learning applications, in commercial use cases and in various research purposes without compromising privacy. VEIL.AI processes data so that the value of collected data sets can be optimised, but ensuring that individuals can no longer be identified.
VEIL.AI solves several challenges related to the use of health data: It maximises data protection by anonymisation apporoaches or by creating synthetic data, it enables secure sharing and further processing of data and it increases the quality of anonymised data. VEIL:AI anonymisation and synthetic data production approaches support many demanding data types (imaging data, signal data, continuously updating realtime data and even genome data).
Maximising data protection and minimising information loss at the same time
The VEIL.AI developers set out to create a solution that would maximise both data protection and usability, while minimising the information loss as well as the time and computational capacity needed to process data. Traditional methods usually require data to be generalised to such an extent that the potential for the subsequent use of data decreases considerably. In addition, traditional methods are poorly suited to anonymising dynamic, continuously accumulating data types. An especially complicated challenge for the traditional methods is any use case where data comes from several sources, e.g. from several hospitals.
VEIL.AI uses neural networks to speed up the computationally heavy processes required for de-identification. It is orders of magnitude faster than other existing methods. It preserves better data quality and it supports anonymisation of such data types, which could not have been anonymised earlier.
VEIL.AI offers solutions to pharmaceutical companies and hospitals
A typical VEIL.AI client is a pharmaceutical company having data accessibility issues. Processing individual-level data might be challenging for several reasons, and it also means that EU's General Data Protection Regulation (GDPR) applies. VEIL.AI solution is based on the observation, that in many cases, the statistical characteristics of the original data can fulfil the needs of the company and individual-level sensitive data is not actually needed. VEIL.AI provides solutions for retaining these important statistical characteristics of the data while removing the characteristics that might compromise data privacy. These methods are called data anonymisation and producing synthetic data.
Many large companies collect a great deal of data, but make only limited use of it. Analyses of such extensive data sets through methods of machine learning can provide companies with significant value.
In addition to pharmaceutical R&D, there are many other potential use cases for sensitive data, e.g. in management, business intelligence or as open data or in so called secondary use. Data privacy is the paramount requirement in all of these activities. VEIL.AI has been verified in several activities of this nature.
Research projects must often use data from several sources. To do this, the relevant organisations must usually either share their data with all project partners or select someone - a trusted third party - to pool the data. With VEIL.AI, multi-partner projects become easier, as raw data no longer needs to be shared in order to be pooled. Instead anonymisation can be done within each organisations so that only anonymised data is shared.
The infograph above shows how personal data can be either pseudonymised or anonymised permanently with the anonymisation tools offered by VEIL.AI. Furthermore, the application can be used in the production of synthetic data. These different data categories can be utilised for various applications, such as drug development, diagnostics and self help as well as building predictive models.
VEIL.AI supports various data types
VEIL.AI has been developed at the University of Helsinki's Institute for Molecular Medicine Finland (FIMM) under the leadership of Janna Saarela and Timo Miettinen. The current team also includes three developers. The responsibility for business development lies with a serial tech entrepreneur Tuomo Pentikäinen.
The VEIL.AI developers have extensive experience in working with patient samples and biobank resources.
"Such research projects have required new tools because no suitable ones have been available. That's why organisations concentrating on medical research, such as FIMM, are several years ahead of the rest of the world in terms of data protection," says Tuomo Pentikäinen.
Despite the team's specialisation in medicine, VEIL.AI can process data in all fields using individual-level data. Some of the recent applications have included location information as well as picture and video data. One of the spearheads of VEIL.AI is production of synthetic data, which is done with the help of Novo Nordisk Foundation funding.
VEIL.AI is capable of producing synthetic data that behaves very similarly to the original data. In this graph real data (green) is compared to corresponding synthetic data generated by VEIL.AI (yellow).
The commercial potential of VEIL.AI was explored with New Business from Research Ideas funding from Business Finland during 2018-19. Recently VEIL.AI has received funding from EIT Digital, where the team collaborates closely with SciLifeLab from Sweden and Philips from the Netherlands. A patent application of the core technology of VEIL.AI has been filed.
"FIMM and their VEIL.AI team are working on very novel concepts. We are in a joint research project, where we produce synthetic data and develop metrics and analysis for assessment of quality and usability of synthetic data. This is very important, as synthetic data is expected to help in some of the most burning problems prevalent in data-intensive HealthTech and drug development. The target of the research is to significantly reduce the lead time of R&D, to reduce or even remove the data breach risks and to improve the quality of data," says Professor Henning Langberg from the Copenhagen HealthTech cluster and the University of Copenhagen.
First published on 23.11.2018, updated on 25.10.2019 and 10.11.2020.