The National Institute of Standards and Technology (NIST) on October 22 published a final report detailing two decades of research and best practices on the process of removing identifying information from data prior to it being shared. The report, first published in draft form in April 2015, is designed to help organizations comply with laws and regulations that require de-identification as a way to reduce the privacy risk associated with the creation, use, archiving, and sharing of data containing personal information.
“De-identification is not a single technique, but a collection of approaches, algorithms, and tools that can be applied to different kinds of data with differing levels of effectiveness,” the report states. “De-identification is especially important for government agencies, businesses, and other organizations that seek to make data available to outsiders.”
The report is intended to be a particularly important resource to federal agencies struggling to protect sensitive information while under pressure to make raw data open to the public. According to the report, many different kinds of information can be de-identified, including structured information, free format text, multimedia, and medical imagery.
Since de-identification techniques are intended to remove identifying information from a dataset while retaining some utility in the remaining data, the report warns that the possibility remains that some information might be re-identified and privacy protection lost. Specifically, risks to individuals that can remain in de-identified data include allowing inferences about individuals in the data without re-identification and impacts on groups represented in the data.
“Organizations endeavoring to share such data might consider employing a combination of several approaches to mitigate re-identification risk,” the report recommends. “These include technical controls, such as removing quasi-identifiers and other kinds of information that might be used to re-identify the data subjects; continuously surveying for data that could be linked to the de-identified information that they are sharing; controls on the de-identified data, such as data use agreements and click-through agreements that prohibit re-identification, linking to other data, or sharing with others; and technical controls that limit the activities of data recipients.”