Homomorphic Encryption Signals the Future for Socially Valuable Research on Private Data
Encryption has become a cornerstone of the technologies that support communication, commerce, banking, and myriad other essential activities in today’s digital world. In an announcement this week, Google revealed a new marketing measurement tool that relies on a particular type of advanced encryption to allow advertisers to understand whether their online ads have resulted in in-store purchases. The announcement created controversy because of the types of data and analysis involved, with the theme of media coverage being that “Google knows your credit card purchases.” Although the details were lost in much of the coverage, we were far more intrigued by the apparent advances in homomorphic encryption that Google seems to have achieved in order to apply this double-blind method at scale.
The Importance of Encryption
If well encrypted, even data that is made public—intentionally or due to a data breach—remains totally unintelligible to those who may try to access it. However, data that is well encrypted also loses its utility, as it can no longer be analyzed or used in ways that we might want and need. For example, if cloud-based data is fully encrypted and only the owner holds the key, the data is unreadable by the cloud provider, safe from attackers, and unavailable to law enforcement authorities that might approach the cloud provider. If the key is lost, the data may be lost forever. This level of protection is often sought out for purposes of data security and privacy—but it also means that the cloud provider cannot perform useful computing on the data that we may want performed, such as to easily provide a “search” function. If the data cannot be read, it cannot be searched or analyzed. Similarly, research cannot be conducted on encrypted data.
But what if there were methods of encryption that ensured data was converted in ciphertext, protecting the privacy of individuals, but enabling research to be conducted on the data? The most advanced technique being developed to enable the performance of some basic functions on data that is encrypted– adding, matching, sorting – is known as homomorphic encryption. This method, recently reviewed in Forbes, has made some great strides in recent years, but has required substantial computing resources and has thus been quite limited in use. Some recent successes have started to make these processes more efficient, exciting researchers who consider fully homomorphic encryption to be a “holy grail” for researchers.
In the words of security expert Bruce Schneier when IBM researcher Craig Gentry first discovered a fully homomorphic cryptosystem: “Visions of a fully homomorphic cryptosystem have been dancing in cryptographers’ heads for thirty years. I never expected to see one. It will be years before a sufficient number of cryptographers examine the algorithm that we can have any confidence that the scheme is secure, but — practicality be damned — this is an amazing piece of work.”
Comparing Datasets through Homomorphic Encryption Methods
One of the reasons researchers are enthused is because very often the data sets they wish to study belong to separate organizations, each who has promised to protect the privacy and personal information of the data subjects. Fully homomorphic encryption provides the ability to generate aggregated reports about the comparisons between separate, fully encrypted, datasets without revealing the underlying raw data. This could prove revolutionary for fields such as medicine, scientific research, and public policy. For example, datasets can be compared to analyze whether people provided with homeless services end up in housing or holding jobs; whether student aid helps students succeed; or whether certain kinds of support can prevent people from being re-admitted to hospitals. These uses all depend on comparing sensitive data held by different parties and subject to strict sharing protections. But homomorphic encryption will allow datasets to be encrypted, thereby protecting personal information from scrutiny, but still compared and analyzed to gain insights from aggregate level summary reports.
Similarly, homomorphic encryption can be used in the fields of advertising and marketing. Google announced that it was using a new double-blind encryption system to enable a de-identified analysis of encrypted data about who has clicked on an advertisement in combination with de-identified data held by companies that maintain credit card purchase records. Google can provide a report to an advertiser that summarizes the relationship between the two databases to conclude, for example, that “5% of the people who clicked on your ad ended up purchasing in your store.”
If the encryption is sound and the data combined is truly unintelligible to both Google and its partners, but the mathematics can still enable useful comparisons, this methodology could be an importance advance for privacy protective research and data sharing. It means that when different researchers have datasets that include private or sensitive information, this methodology could enable valuable insights while respecting individual privacy.
Google seems to have put years of top level research into advancing these privacy protective encryption methods. We expect that after the initial controversy over its application to analyze the effectiveness of advertising, researchers will take a hard look at how these methods can be used for a wide range of socially valuable research. And with respect to advertising itself, it’s certainly good to see real sophisticated privacy-enhancing technologies being used when sensitive data is being analyzed. Although we appreciate the importance of providing users with choices and notices, at the end of the day there is nothing better than scientific, technically advanced protections to ensure personal information is protected.