Chris Wolf Does a "Soap Box" Presentation on Big Data
Tomorrow, Berkeley will host a workshop on Big Data: Values and Governance, the third in the White House’s public events around big data and privacy. As part of this discussion, Chris Wolf presented a “soap box” presentation on big data today. He suggested a few high-level recommendations, including the need to “fully take stock of the benefits in performing a cost-benefit analysis” around big data.
“It is hard to do a cost-benefit analysis if we are only talking about the costs. Researchers, academics, and industry are using big data to deliver big benefits. We need to understand and promote those benefits so that we can more reasonably evaluate whether and how to address the risks that may arise,” he said.
MAC Addresses and De-Identification
Location analytics companies log the hashed MAC address of mobile devices in range of their sensors at airports, malls, retail locations, stadiums and other venues. They do so primarily in order to create statistical reports that provide useful aggregated information such as average wait times on line, store “hot spots,” and the percentage of devices that never make it into a zone that includes a checkout register. FPF worked with the leading companies providing these services to create an enforceable Mobile Location Code of Conduct that restricts discriminatory uses of data, creates a central opt out, promotes in-store notice and other protections. We filed comments last week with the FTC describing the program in detail.
The only data transmitted by mobile devices that most location companies can log is the MAC address – the Wi-Fi or Bluetooth identifier devices broadcast when Wi-Fi or Bluetooth is turned on. The privacy debate around the use of this technology and the Code has centered on the sensitivity of logging and maintaining hashed MAC addresses, and hinges on whether a MAC address should be considered personal information.
Is a MAC address personal information? Well, it is linked to individual consumer devices, either as a consistent Wi-Fi or Bluetooth identifier. If enough data is linked to any consistent identifier over time, it is in the realm of technical possibility that the identity of a user can be ascertained. If there was a commercially-available database of MAC addresses, it is possible that such a database could be used to identify users. We are not aware of any such MAC address look-up database. But we do recognize that the data collected is linked to a specific device. For this reason, the Code of Conduct treats hashed MAC addresses associated with unique devices as something in between fully anonymized data and explicitly personal data. This reflects the view that Professor Daniel Solove posited effectively when he argued that PII exists not as a binary, but on a spectrum, with no risk of identification at one end, and individual identification at the other. In many real-world instances of data collection, the privacy standards in place reflect where the data lies on this spectrum; they consist not only of technical measures to protect the data, but also internal security and administrative controls, as well as enforceable legal commitments. In the case of Mobile Location Analytics, many companies are confident that by hashing MAC addresses, keeping them under administrative and security controls, and publicly committing not to attempt to identify users, they have adequately de-identified the data they log.
However, it is important to understand, that Code does NOT take the position that hashing MAC addresses amounts to a de-identification process that fully resolves privacy concerns. According to the Code, data is only considered fully “de-identified” where it may not reasonably be used to infer information about or otherwise be linked to a particular consumer, computer, or other device. To qualify as de-identified under the Code, a company must take measures such as aggregating data, adding noise to data, or statistical sampling. These are considered to be reasonable measures that de-identify data under the Code, as long as an MLA company also publicly commits not to try to re-identify the data, and contractually prohibits downstream recipients from trying to re-identify it. To assure transparency, any company that does de-identify data in this way must describe how they do so in their privacy policy.
As most of the companies involved in mobile location analytics do indeed link hashed MAC addresses to individual devices, the data they collect to track devices over time does not qualify as strictly “de-identified” under the Code and the data they collect is not exempt from the Code. Rather, the companies collect and use what the Code terms “de-personalized” data.* De-personalized data is defined in the Code as data that can be linked to a particular device, but cannot reasonably be linked to a particular consumer. Companies using de-personalized data must:
take measures to ensure that the data cannot reasonably be linked to an individual (for instance, hashing a MAC address or deleting personally identifiable fields);
publicly commit to maintain the data as de-personalized; and
contractually prohibit downstream recipients from attempting to use the data to identify a particular individual.
When companies hash MAC addresses, they are thus fully subject to the Codes requirements, including signage, consumer choice, non-discrimination.
Different kinds of data on the PII/non-PII spectrum — given the inherent risks and benefits of each — merit a careful consideration of the combination of reasonable technical encryption and administrative measures and legal commitments that would be most suitable. After all, if “completely unidentifiable by any technical means, no matter how complex or unlikely” were the standard for the use of any data in the science and business worlds, much valuable research and commerce would come to an end. The MLA Code represents a pragmatic view that allows vendors to provide a service that is useful for businesses and consumers, while applying responsible privacy standards.
* Suggestions for a better term that de-personalized are welcomed. We considered “pseudonymized” but found the term awkward.
Jules' thoughts on Facebook's "Privacy Dinosaur"
Jules’ new article on LinkedIn discusses Facebook’s recent efforts to remind its users about adjusting the sharing settings on their posts. A pop-up notice with an illustrated dinosaur occasionally appears on a user’s home page when the user posts something publicly, and reminds the user about their option to keep the post limited to a smaller circle of friends if desired.
At the Washington Post Live All Things Connected Forum this week, FPF called for efforts to “poke and provoke” designers to get in the game on designing for usable privacy. The Facebook “Privacy Dinosaur” is just one example of how creative designers and technologists can advance transparency and privacy in a way everyday users can appreciate.
FPF Applauds Department of Commerce For Safe Harbor Website Revision
The Department of Commerce has long listed companies’ participation in the US-EU Safe Harbor program in the Safe Harbor List. Within that list, a significant number of companies are marked with the designation “not current.” As FPF wrote in its paper discussing the Safe Harbor, a company can be listed as “not current” for a number of reasons: they may have failed to fill out specific yearly paperwork, chosen to use other approved data transfer mechanisms, merged with another company, ceased data transfer with the EU, or shut down altogether. However, critics of the Safe Harbor say many companies are claiming to be members while in fact they are not adhering to the Safe Harbor agreement.
FPF noted that a company’s obligations under the Safe Harbor do not end even if the company is listed as non-current: rather, they remain responsible for adhering to the Safe Harbor Principles with respect to all the data they transferred while enjoying the benefits of Safe Harbor membership. When the European Commission recommended that “[t]he Department of Commerce should clearly indicate on its website all companies which are not current members.” FPF agreed and suggested that the Department of Commerce should also include on its website an explanation why a company may be listed as “not current” in order to clear up any potential confusion.
FPF is pleased that the Department of Commerce’s Safe Harbor website was updated in late 2013 with a new notice that makes clear that companies may be listed as non-current for a number of reasons, but are nonetheless subject to FTC enforcement for claiming to be members without adhering to the Safe Harbor Principles. The new notice reads:
“Notice: An organization may be designated as “Not Current” for a variety of reasons. The most common reason is that the organization has failed to reaffirm its adherence to the Safe Harbor Privacy Principles on an annual basis as required by the Safe Harbor Frameworks. Another possible reason is that the organization has failed to comply with one or more of the Safe Harbor Privacy Principles. Organizations designated as “Not Current” are no longer assured of the benefits of the Safe Harbor (i.e., the presumption of “adequacy”). These organizations nevertheless must continue to apply the Safe Harbor Privacy Principles to the personal data received during the period in which they were assured of the benefits of the Safe Harbor for as long as they store, use or disclose those data. Any misrepresentation by an organization designated as “Not Current” concerning its adherence to the Safe Harbor Privacy Principles may be actionable by the Federal Trade Commission or other relevant government body.”
FPF applauds the Department of Commerce for these revisions. We will continue to monitor developments relating to the US-EU Safe Harbor Agreement as they arise.
FPF In the News: a Big Week for Panels and Privacy
The first week of March brought with it a number of great privacy-related events, some run by the IAPP and some hosted by others (including FPF itself!). Below are links to the many events FPF participated in.
In conjunction with Congresswoman Sheila Jackson Lee, FPF presented our fourth annual “Privacy Papers for Policy Makers” this Wednesday. Featured at the event was Professor Kenneth Bamberger from Berkeley, who discussed his paper with Deirdre Mulligan, Privacy in Europe: Initial Data on Governance Choices and Corporate Practice. Professor Neil Richards discussed his paper on why data privacy law is (mostly) constitutional, while Adam Thierer presented A Framework for Benefit-Cost Analysis in Digital Privacy Debates.
FPF Executive Director Jules Polonetsky moderated a panel for a lunch conversation between leading experts, who discussed future privacy principles and frameworks that focus on data use and associated risks. The panelists discussed the ways society can protect the privacy of individuals while providing for responsible, beneficial data use.
FPF Co-Chair Chris Wolf participated on a panel moderated by FPF Senior Fellow Omer Tene on the FTC’s developing “common law of privacy,” which serves as an invaluable reference and guidance tool for corporate data managers not only in the U.S. but also globally. The IAPP Westin Research Center has embarked on a project to collate, index, annotate and make available to policymakers and practitioners a “Comprehensive Casebook of FTC Privacy and Information Security Law.” In this session, Chris, Omer and others discussed the findings of the project and initial conclusions with senior FTC staff.
FPF Executive Director Jules Polonetsky, FPF Board Members Larry Magid and Andy Bloom, and Kathleen Styles, Chief Privacy Officer, U.S. Department of Education participated in a panel about the impact of new education technologies on student privacy. New technologies and data are being used for a variety of services in schools, from administrative uses such as managing class schedules, buses and registration, to educational tools such as remote learning and personalized curricula. Data is increasingly being shared with third parties, and apps and tablets are increasingly essential to the learning environment. The confluence of enhanced data collection with highly sensitive information about children and teens creates privacy risks: FPF has recently organized a working group of companies and other experts to work on crafting solutions to this hot issue. Contact FPF to get involved.
FPF Co-Chair Chris Wolf moderated a panel discussing government access to private-sector data: what actually is being exposed in the U.S. and in the EU; what checks and oversight exists in the various jurisdictions; what those holding the data and those whose data is held can do to address privacy and free expression concerns; and what impact the publicity over national security access is having on public policy and international relations.
FPF Policy Director Josh Harris moderated a panel on the many new developments in the world of connected cars. The panel explored the new technologies and their risks for the privacy of individuals, and demonstrated best practices and solutions for ensuring compliance and transparency within the connected automobile environment. Access Josh’s presentation here.
FPF Co-Chair Chris Wolf moderated a panel describing the struggles facing class action plaintiffs in the privacy field. The panel, which brought together some of the leading plaintiff and defendant attorneys in the country, discussed legal theories of harm and standing, proof of causation, commonality and the likelihood that a plaintiff’s injury will be redressed by a favorable decision.
FPF Executive Director Jules Polonetsky moderated a panel on California’s new “eraser button” law, which requires certain websites to allow minors to remove embarrassing postings. The panel covered similar legislative efforts in the U.S. and E.U., as well as the growing trend in consumer technology for “ephemeral” messaging services such as Frankly, SnapChat, and Whisper.
White House/MIT Big Data Privacy Workshop Recap
Speaking for everyone snowed-in in DC, White House Counselor John Podesta remarked that “big snow trumped big data,” while on the phone to open the first of the Obama Administration’s three big data and privacy workshops. This first workshop focused on advancing the “start of the art” in technology and practice. While these workshops are ultimately the product of Edward Snowden’s NSA leaks last year, Mr. Podesta explained that his big data review group was conducting a broad review on a “somewhat separate track” from an ongoing review of the intelligence community. His remarks focused on several specific example of the social value of data, but he cautioned that “we need to be conscious of the implications for individuals. How should we think about individuals’ sense of their identity when data reveals things they didn’t even know about themselves?”
To that end, he noted that “we can’t wait to get privacy perfect to get going,” and noting this workshop was designed to talk about technology around data, he hoped the workshop would help inform the Administration about what it needs to take away about the state of data privacy right now.
Cynthia Dwork, from Microsoft Research, followed Mr. Podesta with a deep-dive into differential privacy. In English, as she put IT, differential privacy works to ensure that the outcome of any analysis is equally likely, independent of whether an individual join or does not join a database. The goal is to limit the range of potential harms to any individual from participating in data analysis. The challenge posed by big data is that multiple uses of data create a cumulatively harm to privacy, which is difficult to measure. Overly accurate estimates of too much information are “blatantly non-private,” Dwork argued.
While Dwork focused on new technologies to advance privacy, a slate of MIT professors presented brief examples of how big data is providing big social benefits in health care, transportation, and education:
John Guttag discussed the importance of large scale data for clinical studies. He pushed back against requiring very specific consents for patient data use, suggesting they would do a lot of harm. ‘We find a lot of data for one purpose that can be used for another. It’s important not to be too specific.” He suggested meaningful consent could be gained simply be educating patients about the value of their data. “I think we underestimate the members of our society,” he said. “I think most people fear death or the death of a loved one more than a loss of privacy.” Manolis Kellis explored how large numbers of data sets are essential to advance discoveries in human disease genomics. He argued that much of our discussion is caught up by the mere illusion of privacy: “Every time you take your coat off, you’re providing your DNA to someone.” Thus, we need to implement restrictions that would mitigate negative uses, such as insurers using genomic data to discriminate against individuals.
Sam Madden connected the challenges posed by big data to the parallel phenomenon of the Internet of Things. He noted that societal apps and societal applications of data both have privacy concerns, and argued that very compelling societal goods come from “societal roll-ups of data.” For example, he discussed how risky driver behavior could be mitigated through surveillance — the riskiest category of male drivers will reduce bad driving habits by up to 72% if monitored. “We can argue that this is creepy, but it’s societally compelling.” he said. “We — as a society — have to decide what we’re comfortable with.
Anant Agarwal, president of edX, the massively open online course (MOOC) platform created by Harvard and MIT, described big data as a “particle accelerator” for learning. Noting that edX has students in every country in the world, MOOCs can provide interesting insights into how students learn and how they interact with peers. He described data that showed how students over time began tackling homework prior to lectures, and suggested that data could eliminate subjective guesswork in education. The challenge is that a lot of the data benefits from education can only be derived through information sharing, yet adequately protecting individual student information can be challenging. Mr. Agarwal noted that his daughter used the same username on edX as she does on Facebook. “We can omit that information,” but students use also identifiers in forums and in other formats, he said. “We’d like to share the data we get with everyone,” he said, but he wondered how that could be done safely. “What is de-identification?” he asked.
When the floor was opened for questions, a skeptic in the crowd noted that one of the biggest drivers of data collection is not social benefits, but rather to make money. Mr. Agarwal suggested that was the very reason edX was a non-profit was in order for its use of sensitive data “to be judged by different criteria than maximizing return on investment.”
Secretary of Commerce Penny Pritzker suggested that harnessing the potential of data would hinge upon user trust. Highlighting Commerce’s efforts to advance multistakeholder codes of conduct and ensure the efficacy of the U.S.-EU Safe Harbor, Ms. Pritzker suggested government needed to continually evaluate and work with companies to uncover the technologies and practices that promote trust. She expressed hope that efforts like the day’s workshop could help to show that confidence placed in American companies should remain “rock solid.”
The program’s afternoon shifted to a broad discussion of privacy enhancing technologies (PETs), specifically developments in differential privacy, encryption, and accountability systems. There was a recognition that with any computer system that compromises in security and privacy are inevitable — complex software will have bugs, many different people will need access to infrastructure, and interesting data processing will require the use of confidential information or PII.
Danny Weitzner lamented a better definition of privacy for computer designers and engineers to build toward. Alan Westin’s original definition, that privacy is a claim by an individual, groups, or institutions to determine for themselves when, how, and to what extent their information can be communicated to others, has “led us astray,” Prof. Weitzner argued. He argued that throughout the day multiple substantive definitions of privacy had come up in discussion, and he argued that we “need a way to know what’s going on” in order to “allow data for some purposes, but won’t be misused for others.”
Quoting Judge Reggie B. Walton about the challenges facing the FISA Court, Weitzner noted that “we don’t currently have the tools in everyday systems to assess how information is used.” Weitzner discussed his work on information accountability.
Weitzner then led a large hypothetical discussion where MIT in the near-future “embrace[s] the potential of data-powered, analytics-driven systems in all aspects of campus life, from education to health care to community sustainability.” Weitzner asked a slate of panelists what they would do as the future chief privacy officer of MIT, and Intel’s David Hoffman suggested that we all need to understand “that a lot of the data about us that’s now out there is not coming from us.” As a result, meaningful transparency must mean more than notice to individuals. Panelists then hit a wide-gamut of issues from the ethical challenges around predictive analysis and the need to get serious about addressing questions about use, teeing up the Administration’s next workshop on the ethics of big data.
Privacy Papers on Capitol Hill — March 5
In conjunction with Congresswoman Sheila Jackson Lee, FPF will be presenting our fourth annual “Privacy Papers for Policy Makers” next Wednesday, March 5th. The event will be held in Rayburn House Office Building Room 2103 from 8:30 – 9:45 AM, coffee and breakfast will be provided. Event is sold out.
Featured at the event will be Professors Kenneth Bamberger and Deirdre Mulligan from Berkeley, who will be discussing their paper Privacy in Europe: Initial Data on Governance Choices and Corporate Practice. Professor Neil Richards will discuss his paper on why data privacy law is (mostly) constitutional, while Adam Thierer will present A Framework for Benefit-Cost Analysis in Digital Privacy Debates.
FPF is also pleased to have Jacob Kohnstamm, Chairman, Dutch Data Protection Authority, join us to provide reaction. Additionally, special guests Giovanni Buttarelli, Asst. European Data Protection Supervisor; Christopher Graham, UK Information Commissioner; Isabelle Falque-Pierrotin, CNIL (France); and María Elena Pérez-Jaén Zermeño, IFAI (Mexico), will be attending.
This event is intended to comply with applicable Congressional and Executive branch gift rules. Contact us with questions.
We were excited to learn that Aislelabs, a member of FPF’s Mobile Location Analytics privacy working group has been named a Privacy by Design Ambassador by the Information and Privacy Commissioner of Ontario. Like fellow PbD Ambassador Euclid Analytics, Aislelabs has signed on to our Mobile Location Analytics (MLA) Code of Conduct, which ensures that consumers are provided with transparency and choice as to whether MLA companies may collect their information. As the launch date of our central opt-out site fast approaches, we’re glad to see member companies being recognized for their commitment to consumer privacy in this space.
Jules, Omer and Chris Discuss the Challenges of Big Data and Consumer Review Boards
FPF’s Co-Chair and Executive Director Jules Polonetsky, Senior Fellow Omer Tene, and Co-Chair Christopher Wolf discussed the challenges facing President Obama with respect to big data in a new post for the IAPP. The post argues that balancing the benefits of data analytics against attendant risks to civil liberties presents the biggest public policy challenge of our time.
FPF is currently developing a toolkit designed to help privacy professionals perform a comprehensive, rigorous cost-benefit analysis to determine how best to pursue their big data goals. Companies should have a clear framework to use in order to evaluate how a data-driven project will affect their consumers. Recent articles about “people analytics” to guide hiring practices to a school’s tracking of its students add additional examples that make it clear that a “practical application of fair information principles that accounts for modern day realities of collection and use” has become increasingly necessary.
One potential path forward involves the creation of “Consumer Subject Review Boards,” an idea discussed by Ryan Calo.* Such review boards would assess and evaluate big data projects’ rewards and associated risks. They would play an instrumental role in revitalizing consumer trust and mitigating some of the risks associated with innovative uses of consumer data. However, there are still questions that must be answered before such review boards could be deployed in practice.
First, what type of issues would a review board address? Would it be focused on addressing only privacy dilemmas, or would it seek to anticipate other ethical issues? As Evan Selinger and Patrick Lin write: “A technology ethics board . . . can be an invaluable canary in the coalmine—scouting for explosive issues in advance of emerging technology and before the law eventually turns its attention to these new problems and the company itself.” Clearly, a review board would need to have a clear understanding of its proper subject-matter scope.
Another question is whether the review boards should be in-house or independent. An in-house review board would benefit from close familiarity with its company’s data practices, but would lack the credibility of an independent entity. Similarly, should the opinions of a review board be confidential or publicly available? Confidential opinions do not instill as much consumer trust; however, public opinions risk being watered down to mitigate future litigation risks, and could potentially chill valuable innovation. Some companies have privacy advisory boards already – is a Consumer Review Board a more formal example of a privacy board? What methodology will the members use?
We look forward to continuing to explore this promising idea.
* See alsoMalcolm Crompton’s suggestion of ethics boards for privacy issues.
Peter Swire: Why Tech Companies and the NSA Diverge on Snowden
FPF Senior Fellow Peter Swire has an op-ed in today’s Washington Post that discusses how tech companies and the intelligence community are grappling with the traitor-or-whistleblower debate when it comes to Edward Snowden. His conclusion suggests the debate provokes a much broader set of issues:
Fundamentally, the traitor-or-whistleblower debate comes down to different views of what values should be paramount in governing the Internet we all use. The Internet is where surveillance happens to keep our nation safe. It is also where we engage in e-commerce and express ourselves in infinite ways. The goal is to create one communications structure that safeguards diverse, important values.