One key method for ensuring privacy while processing large amounts of data is de-identification.
De-identified data refers to data through which a link to a particular individual cannot be established. This often involves “scrubbing” the identifiable elements of personal data, making it “safe” in privacy terms while attempting to retain its commercial and scientific value.
Future of Privacy Forum “De-ID Project”
In the era of big data, the debate over the definition of personal information, de-identification and re-identification has never been more important. Privacy regimes often rely on data being considered Personal in order to require the application of privacy rights and protections. Data that is anonymous is considered free of privacy risk and available for public use.
Yet much data that is collected and used exists somewhere on a spectrum between these stages. FPF’s De-ID Project has proposed a practical framework for applying privacy restrictions to data based on the nature of data that is collected, the risks of de-identification, and the additional legal and administrative protections that may be applied. Important questions FPF has considered include:
- What weight should be given to non-technical factors such as legal commitments not to make data public or not to attempt to re-identify data.
- What weight is to be given to impacts of de-ID techniques on utility of data.
- What status should be awarded to linkable or pseudonymous data.
FPF has now proposed a detailed breakout of the categories under a deidentified framework. See the full graphic here.
FPF’s framework described in Shades of Gray: Seeing the Full Spectrum of Practical Data De-Identification, was published in the Santa Clara Law Review.
In legal terms, the criteria for de-identified data remain vague. The Health Insurance Portability and Accountability Act defines data as de-identified if it “does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable health information.” In its recent report, the FTC gave recommendations to help assess whether data should be considered identifiable. However, best practices have not been identified and industry practices vary widely.
FPF held a conference on December 5, 2011 to begin addressing this issue. Our goal is to facilitate the development of safe de-identification practices for data sets that extend beyond the health-care sector.
In November 2016, the Brussels Privacy Hub of the Vrije Universiteit Brussel and FPF hosted an all-day workshop, Identifiability: Policy and Practical Solutions for Anonymization and Pseudonymization, to address the technical questions underlying the de-identification debate and establish consensus over how best to advance the discussion about the benefits and limits of de-identification.