How can we glean useful insights from databases containing confidential information while protecting the privacy of the individuals whose data is contained within? Differential privacy, a way of defining privacy in a mathematically rigorous manner, can help strike this balance. Newly updated guidelines from the National Institute of Standards and Technology (NIST) are intended to assist organizations with making the most of differential privacy's capabilities.
Differential privacy, or DP, is a privacy-enhancing technology used in data analytics. In recent years, it has been successfully deployed by large technology corporations and the U.S. Census Bureau. While it is a relatively mature technology, a lack of standards can create challenges for its effective use and adoption. For example, a DP software vendor may offer guarantees that if its software is used, it will be impossible to re-identify an individual whose data appears in the database. NIST's new guidelines aim to help organizations understand and think more consistently about such claims.
The newly finalized publication, Guidelines for Evaluating Differential Privacy Guarantees (NIST Special Publication 800-226), was originally released in draft form in December 2023. Based in part on comments received, the authors updated the guidelines with the goal of making them clearer and easier to use.
"The changes we made improve the precision in the draft's language to make the guidelines less ambiguous," said Gary Howarth, a NIST scientist and an author of the publication. "The guidelines can help leaders more clearly understand the trade-offs inherent in DP and can help understand what DP claims mean."
Differential privacy works by adding random "noise" to the data in a way that obscures the identity of the individuals but keeps the database useful overall as a source of statistical information. However, noise applied in the wrong way can jeopardize privacy or render the data less useful.
To help users avoid these pitfalls, the document includes interactive tools, flow charts, and even sample computer code that can aid in decision-making and show how varying noise levels can affect privacy and data usability.
"Small groups in the data of any sort tend to stand out more, so you may need to add more noise to protect their privacy," Howarth said.
While the document is not intended to be a complete primer on differential privacy, Howarth said that it provides a robust reading list of other publications that can help practitioners get up to speed on the topic. The guidelines also cover the sorts of problems that the technology could work with and how to implement it in those situations.
"With DP there are many gray areas," he said. "There is no simple answer for how to balance privacy with usefulness. You must answer that every time you apply DP to data. This publication can help you navigate that space."