Scientists have developed a new mathematical model to help people understand the risks posed by AI and assist regulators in protecting privacy.
AI tools are increasingly being used to track and monitor people both online and in person, posing challenges for anonymity and privacy. For example, AI tools are being trialled to automatically identify individuals from their voices in online banking, their eyes in humanitarian aid delivery, or their faces in law enforcement.
Some AI identification techniques perform highly accurately when tested in small case studies but then misidentify people in real-world conditions
Computer scientists at Imperial College London, the Oxford Internet Institute and UCLouvain have now created a method that provides a robust scientific framework for evaluating identification techniques, especially when dealing with large-scale data.
According to the researchers, this could help organisations to strike a better balance between the benefits of AI technologies and the need to protect people's personal information, making daily interactions with technology safer and more secure. Their testing method allows for the identification of potential weaknesses and areas for improvement in AI tools before they are implemented at scale.
The findings are published today in in Nature Communications.
Associate Professor Yves-Alexandre de Montjoye, co-author of the study from the Data Science Institute at Imperial College London, said: "Our new scaling law provides, for the first time, a principled mathematical model to evaluate how identification techniques will perform at scale. Understanding the scalability of identification is essential to evaluate the risks posed by these re-identification techniques, including to ensure compliance with modern data protection legislations worldwide."
The method draws on the field of Bayesian statistics to learn how identifiable individuals are on a small scale, and extrapolate the accuracy of identification to larger populations up to 10x better than previous heuristics and rules of thumb. This gives the method unique power in assessing how different data identification techniques will perform at scale, in different applications and behavioural settings. This can help explain why some AI identification techniques perform highly accurately when tested in small case studies but then misidentify people in real-world conditions.
Lead author Dr Luc Rocher, Senior Research Fellow, Oxford Internet Institute, part of the University of Oxford, said: "We see our method as a new approach to help assess the risk of re-identification in data release, but also to evaluate modern identification techniques in critical, high-risk environments. In places like hospitals, humanitarian aid delivery, or border control, the stakes are incredibly high, and the need for accurate, reliable identification is paramount.
"We believe that this work forms a crucial step towards the development of principled methods to evaluate the risks posed by ever more advanced AI techniques and the nature of identifiability in human traces online. We expect that this work will be of great help to researchers, data protection officers, ethics committees, and other practitioners aiming to find a balance between sharing data for research and protecting the privacy of patients, participants, and citizens."
Based on a press release issued by the Oxford Internet Institute.