3 - 2014 - Future of Privacy Forum

Comments for the White House "Big Data Review"

This afternoon, FPF submitted comments to help inform the White House Office of Science and Technology Policy’s “Big Data Review.” Announced in January, the White House Big Data Review has been a helpful exercise in scoping out how big data is changing our society. Through public workshops at MIT, NYU, and Berkeley, the review has solicited thought leadership from a wide array of academics and researchers. Moving forward, FPF believes there is much that can be done to promote innovation in a way that advances privacy.

We advanced the following recommendations for the OSTP Big Data Review report:

1) Embrace a flexible application of Fair Information Practice Principles (FIPPs). Traditional FIPPs have guided privacy policy nationally and around the globe for more than 40 years, and the White House Consumer Privacy Bill of Rights is the most recent effort to carry these principles forward into a world of big data. FPF supports the continued reliance on the FIPPs and believes they remain flexible enough to address many of the challenges posed by big data when applied in a practical, use-based manner. Our Comments recommend a nuanced approach to their applicability that accounts for modern day technical realities.

2) Promote the benefits of big data in society. Researchers, academics, and industry have demonstrated how big data can be useful in driving economic growth, advancing public safety and health, and improving our schools. Yet, privacy advocates and the public appear skeptical of these benefits in the face of certain outlier uses. More work is needed to understand the ways big data is already improving society and making businesses more efficient and innovative. This report should highlight the importance of big data’s benefits and identify additional opportunities to promote positive uses of big data.

3) Support efforts to advance practical de-identification, including policy and technological solutions. While the Federal Trade Commission (FTC) has acknowledged that data that is effectively de-identified poses no significant privacy risk, there remains considerable debate over what effective de-identification requires. FPF believes that technical anonymization measures are only one component of effective de–identification. Instead, a broader understanding that takes into account how administrative and legal safeguards, as well as whether data is public or non-public, should inform conversations about effective de-identification procedures.

4) Encourage additional work to frame context and promote enhanced transparency. The context in which data is collected and used is an important part of understanding individuals’ expectations, and context is a key principle in both the Consumer Privacy Bill of Rights and the FTC Privacy Framework. Respect for context is an increasingly important privacy principle, yet more work by academics, industry, and policymakers is needed about how to properly frame and define this principle. The Department of Commerce-led Internet Policy Task Force (IPTF) should continue its work convening stakeholders and hold programs that could help frame context in an age of big data. At the same time, another important tool that can be used to promote public trust in big data is enhanced transparency efforts. In particular, FPF has called for more transparency surrounding high-level decisional criteria that organizations may use to make decisions about individuals.

5) Encourage efforts to promote accountability by organizations working with big data. Data privacy frameworks increasingly rely on organizational accountability to ensure responsible data stewardship. In the context of big data, FPF supports the further development of the concept of internal review boards that could help companies weigh the benefits and risks of data uses. In conjunction with the evolving role of the privacy professional, accountability measures can be put in place to ensure big data projects take privacy considerations into account.

6) Promote government leadership on big data through its own procedures and practices. The federal government is one of the largest producers and users of data, and, as a result, the government may inform industry practice and help demonstrate the value of data through its own uses of big data across and among agencies. The Federal Chief Information Officer (CIO) Council is particularly well-positioned to ensure the federal government can maximize the potential of big data with an eye toward privacy protection.

7) Promote global efforts to facilitate interoperability. Recent privacy developments in the Asia Pacific and the European Union have given new life to constructive collaboration on the cross jurisdictional issues presented by big data. FPF urges government to actively promote and maintain existing frameworks to facilitate interoperability, including the US-EU Safe Harbor and the Asia Pacific Economic Cooperation’s (APEC) Cross Border Privacy Rules (CBPR) System.

Big data presents many benefits and potential risks. A thoughtful, balanced analysis of the value choices now at hand is essential. The Administration’s efforts to convene thought leaders have produced many fruitful conversations, and more are needed. At the same time, it will be essential that the Administration provide transparency and a clear plan of action to all stakeholders moving forward.These broad next steps are suggested as a helpful beginning to the work that needs to be done.

Big data offers the United States a great opportunity to provide global leadership on promoting innovation – and protecting privacy. It also presents a challenge, but we have the privacy principles and frameworks needed to thoughtfully address that task.

Chris Wolf Does a "Soap Box" Presentation on Big Data

Tomorrow, Berkeley will host a workshop on Big Data: Values and Governance, the third in the White House’s public events around big data and privacy. As part of this discussion, Chris Wolf presented a “soap box” presentation on big data today. He suggested a few high-level recommendations, including the need to “fully take stock of the benefits in performing a cost-benefit analysis” around big data.

“It is hard to do a cost-benefit analysis if we are only talking about the costs. Researchers, academics, and industry are using big data to deliver big benefits. We need to understand and promote those benefits so that we can more reasonably evaluate whether and how to address the risks that may arise,” he said.

MAC Addresses and De-Identification

Location analytics companies log the hashed MAC address of mobile devices in range of their sensors at airports, malls, retail locations, stadiums and other venues. They do so primarily in order to create statistical reports that provide useful aggregated information such as average wait times on line, store “hot spots,” and the percentage of devices that never make it into a zone that includes a checkout register. FPF worked with the leading companies providing these services to create an enforceable Mobile Location Code of Conduct that restricts discriminatory uses of data, creates a central opt out, promotes in-store notice and other protections. We filed comments last week with the FTC describing the program in detail.

The only data transmitted by mobile devices that most location companies can log is the MAC address – the Wi-Fi or Bluetooth identifier devices broadcast when Wi-Fi or Bluetooth is turned on. The privacy debate around the use of this technology and the Code has centered on the sensitivity of logging and maintaining hashed MAC addresses, and hinges on whether a MAC address should be considered personal information.

Is a MAC address personal information? Well, it is linked to individual consumer devices, either as a consistent Wi-Fi or Bluetooth identifier. If enough data is linked to any consistent identifier over time, it is in the realm of technical possibility that the identity of a user can be ascertained. If there was a commercially-available database of MAC addresses, it is possible that such a database could be used to identify users. We are not aware of any such MAC address look-up database. But we do recognize that the data collected is linked to a specific device. For this reason, the Code of Conduct treats hashed MAC addresses associated with unique devices as something in between fully anonymized data and explicitly personal data. This reflects the view that Professor Daniel Solove posited effectively when he argued that PII exists not as a binary, but on a spectrum, with no risk of identification at one end, and individual identification at the other. In many real-world instances of data collection, the privacy standards in place reflect where the data lies on this spectrum; they consist not only of technical measures to protect the data, but also internal security and administrative controls, as well as enforceable legal commitments. In the case of Mobile Location Analytics, many companies are confident that by hashing MAC addresses, keeping them under administrative and security controls, and publicly committing not to attempt to identify users, they have adequately de-identified the data they log.

However, it is important to understand, that Code does NOT take the position that hashing MAC addresses amounts to a de-identification process that fully resolves privacy concerns. According to the Code, data is only considered fully “de-identified” where it may not reasonably be used to infer information about or otherwise be linked to a particular consumer, computer, or other device. To qualify as de-identified under the Code, a company must take measures such as aggregating data, adding noise to data, or statistical sampling. These are considered to be reasonable measures that de-identify data under the Code, as long as an MLA company also publicly commits not to try to re-identify the data, and contractually prohibits downstream recipients from trying to re-identify it. To assure transparency, any company that does de-identify data in this way must describe how they do so in their privacy policy.

As most of the companies involved in mobile location analytics do indeed link hashed MAC addresses to individual devices, the data they collect to track devices over time does not qualify as strictly “de-identified” under the Code and the data they collect is not exempt from the Code. Rather, the companies collect and use what the Code terms “de-personalized” data.* De-personalized data is defined in the Code as data that can be linked to a particular device, but cannot reasonably be linked to a particular consumer. Companies using de-personalized data must:

take measures to ensure that the data cannot reasonably be linked to an individual (for instance, hashing a MAC address or deleting personally identifiable fields);
publicly commit to maintain the data as de-personalized; and
contractually prohibit downstream recipients from attempting to use the data to identify a particular individual.

When companies hash MAC addresses, they are thus fully subject to the Codes requirements, including signage, consumer choice, non-discrimination.

Different kinds of data on the PII/non-PII spectrum — given the inherent risks and benefits of each — merit a careful consideration of the combination of reasonable technical encryption and administrative measures and legal commitments that would be most suitable. After all, if “completely unidentifiable by any technical means, no matter how complex or unlikely” were the standard for the use of any data in the science and business worlds, much valuable research and commerce would come to an end. The MLA Code represents a pragmatic view that allows vendors to provide a service that is useful for businesses and consumers, while applying responsible privacy standards.

* Suggestions for a better term that de-personalized are welcomed. We considered “pseudonymized” but found the term awkward.

Jules' thoughts on Facebook's "Privacy Dinosaur"

Jules’ new article on LinkedIn discusses Facebook’s recent efforts to remind its users about adjusting the sharing settings on their posts. A pop-up notice with an illustrated dinosaur occasionally appears on a user’s home page when the user posts something publicly, and reminds the user about their option to keep the post limited to a smaller circle of friends if desired.

At the Washington Post Live All Things Connected Forum this week, FPF called for efforts to “poke and provoke” designers to get in the game on designing for usable privacy. The Facebook “Privacy Dinosaur” is just one example of how creative designers and technologists can advance transparency and privacy in a way everyday users can appreciate.

FPF Applauds Department of Commerce For Safe Harbor Website Revision

The Department of Commerce has long listed companies’ participation in the US-EU Safe Harbor program in the Safe Harbor List. Within that list, a significant number of companies are marked with the designation “not current.” As FPF wrote in its paper discussing the Safe Harbor, a company can be listed as “not current” for a number of reasons: they may have failed to fill out specific yearly paperwork, chosen to use other approved data transfer mechanisms, merged with another company, ceased data transfer with the EU, or shut down altogether. However, critics of the Safe Harbor say many companies are claiming to be members while in fact they are not adhering to the Safe Harbor agreement.

FPF noted that a company’s obligations under the Safe Harbor do not end even if the company is listed as non-current: rather, they remain responsible for adhering to the Safe Harbor Principles with respect to all the data they transferred while enjoying the benefits of Safe Harbor membership. When the European Commission recommended that “[t]he Department of Commerce should clearly indicate on its website all companies which are not current members.” FPF agreed and suggested that the Department of Commerce should also include on its website an explanation why a company may be listed as “not current” in order to clear up any potential confusion.

FPF is pleased that the Department of Commerce’s Safe Harbor website was updated in late 2013 with a new notice that makes clear that companies may be listed as non-current for a number of reasons, but are nonetheless subject to FTC enforcement for claiming to be members without adhering to the Safe Harbor Principles. The new notice reads:

“Notice: An organization may be designated as “Not Current” for a variety of reasons. The most common reason is that the organization has failed to reaffirm its adherence to the Safe Harbor Privacy Principles on an annual basis as required by the Safe Harbor Frameworks. Another possible reason is that the organization has failed to comply with one or more of the Safe Harbor Privacy Principles. Organizations designated as “Not Current” are no longer assured of the benefits of the Safe Harbor (i.e., the presumption of “adequacy”). These organizations nevertheless must continue to apply the Safe Harbor Privacy Principles to the personal data received during the period in which they were assured of the benefits of the Safe Harbor for as long as they store, use or disclose those data. Any misrepresentation by an organization designated as “Not Current” concerning its adherence to the Safe Harbor Privacy Principles may be actionable by the Federal Trade Commission or other relevant government body.”

FPF applauds the Department of Commerce for these revisions. We will continue to monitor developments relating to the US-EU Safe Harbor Agreement as they arise.

FPF In the News: a Big Week for Panels and Privacy

The first week of March brought with it a number of great privacy-related events, some run by the IAPP and some hosted by others (including FPF itself!). Below are links to the many events FPF participated in.

Privacy Papers for Policy Makers Launch Event

In conjunction with Congresswoman Sheila Jackson Lee, FPF presented our fourth annual “Privacy Papers for Policy Makers” this Wednesday. Featured at the event was Professor Kenneth Bamberger from Berkeley, who discussed his paper with Deirdre Mulligan, Privacy in Europe: Initial Data on Governance Choices and Corporate Practice. Professor Neil Richards discussed his paper on why data privacy law is (mostly) constitutional, while Adam Thierer presented A Framework for Benefit-Cost Analysis in Digital Privacy Debates.

pppm1

@Microsoft Conversation on Privacy: “Privacy Models: The Next Evolution”

FPF Executive Director Jules Polonetsky moderated a panel for a lunch conversation between leading experts, who discussed future privacy principles and frameworks that focus on data use and associated risks. The panelists discussed the ways society can protect the privacy of individuals while providing for responsible, beneficial data use.

IAPP: FTC Privacy and Data Security Jurisprudence

FPF Co-Chair Chris Wolf participated on a panel moderated by FPF Senior Fellow Omer Tene on the FTC’s developing “common law of privacy,” which serves as an invaluable reference and guidance tool for corporate data managers not only in the U.S. but also globally. The IAPP Westin Research Center has embarked on a project to collate, index, annotate and make available to policymakers and practitioners a “Comprehensive Casebook of FTC Privacy and Information Security Law.” In this session, Chris, Omer and others discussed the findings of the project and initial conclusions with senior FTC staff.

IAPP: Ed Tech, Data and Student Privacy

FPF Executive Director Jules Polonetsky, FPF Board Members Larry Magid and Andy Bloom, and Kathleen Styles, Chief Privacy Officer, U.S. Department of Education participated in a panel about the impact of new education technologies on student privacy. New technologies and data are being used for a variety of services in schools, from administrative uses such as managing class schedules, buses and registration, to educational tools such as remote learning and personalized curricula. Data is increasingly being shared with third parties, and apps and tablets are increasingly essential to the learning environment. The confluence of enhanced data collection with highly sensitive information about children and teens creates privacy risks: FPF has recently organized a working group of companies and other experts to work on crafting solutions to this hot issue. Contact FPF to get involved.

ed tech

IAPP: Governmental Access to Private-sector Data: The Realities and Impacts in the U.S. and EU

FPF Co-Chair Chris Wolf moderated a panel discussing government access to private-sector data: what actually is being exposed in the U.S. and in the EU; what checks and oversight exists in the various jurisdictions; what those holding the data and those whose data is held can do to address privacy and free expression concerns; and what impact the publicity over national security access is having on public policy and international relations.

IAPP: From 0–60: Privacy and the New Generation of Connected Cars

FPF Policy Director Josh Harris moderated a panel on the many new developments in the world of connected cars. The panel explored the new technologies and their risks for the privacy of individuals, and demonstrated best practices and solutions for ensuring compliance and transparency within the connected automobile environment. Access Josh’s presentation here.

IAPP: Judge, Jury and Executioner: Are Federal Courts Giving Privacy Class Actions a Fair Chance?

FPF Co-Chair Chris Wolf moderated a panel describing the struggles facing class action plaintiffs in the privacy field. The panel, which brought together some of the leading plaintiff and defendant attorneys in the country, discussed legal theories of harm and standing, proof of causation, commonality and the likelihood that a plaintiff’s injury will be redressed by a favorable decision.

IAPP: Eraser Buttons, the Right to Delete and the Rise of Tech Solutions for Ephemeral Data

FPF Executive Director Jules Polonetsky moderated a panel on California’s new “eraser button” law, which requires certain websites to allow minors to remove embarrassing postings. The panel covered similar legislative efforts in the U.S. and E.U., as well as the growing trend in consumer technology for “ephemeral” messaging services such as Frankly, SnapChat, and Whisper.

White House/MIT Big Data Privacy Workshop Recap

Speaking for everyone snowed-in in DC, White House Counselor John Podesta remarked that “big snow trumped big data,” while on the phone to open the first of the Obama Administration’s three big data and privacy workshops. This first workshop focused on advancing the “start of the art” in technology and practice. While these workshops are ultimately the product of Edward Snowden’s NSA leaks last year, Mr. Podesta explained that his big data review group was conducting a broad review on a “somewhat separate track” from an ongoing review of the intelligence community. His remarks focused on several specific example of the social value of data, but he cautioned that “we need to be conscious of the implications for individuals. How should we think about individuals’ sense of their identity when data reveals things they didn’t even know about themselves?”

To that end, he noted that “we can’t wait to get privacy perfect to get going,” and noting this workshop was designed to talk about technology around data, he hoped the workshop would help inform the Administration about what it needs to take away about the state of data privacy right now.

Cynthia Dwork, from Microsoft Research, followed Mr. Podesta with a deep-dive into differential privacy. In English, as she put IT, differential privacy works to ensure that the outcome of any analysis is equally likely, independent of whether an individual join or does not join a database. The goal is to limit the range of potential harms to any individual from participating in data analysis. The challenge posed by big data is that multiple uses of data create a cumulatively harm to privacy, which is difficult to measure. Overly accurate estimates of too much information are “blatantly non-private,” Dwork argued.

While Dwork focused on new technologies to advance privacy, a slate of MIT professors presented brief examples of how big data is providing big social benefits in health care, transportation, and education:

John Guttag discussed the importance of large scale data for clinical studies. He pushed back against requiring very specific consents for patient data use, suggesting they would do a lot of harm. ‘We find a lot of data for one purpose that can be used for another. It’s important not to be too specific.” He suggested meaningful consent could be gained simply be educating patients about the value of their data. “I think we underestimate the members of our society,” he said. “I think most people fear death or the death of a loved one more than a loss of privacy.” Manolis Kellis explored how large numbers of data sets are essential to advance discoveries in human disease genomics. He argued that much of our discussion is caught up by the mere illusion of privacy: “Every time you take your coat off, you’re providing your DNA to someone.” Thus, we need to implement restrictions that would mitigate negative uses, such as insurers using genomic data to discriminate against individuals.
Sam Madden connected the challenges posed by big data to the parallel phenomenon of the Internet of Things. He noted that societal apps and societal applications of data both have privacy concerns, and argued that very compelling societal goods come from “societal roll-ups of data.” For example, he discussed how risky driver behavior could be mitigated through surveillance — the riskiest category of male drivers will reduce bad driving habits by up to 72% if monitored. “We can argue that this is creepy, but it’s societally compelling.” he said. “We — as a society — have to decide what we’re comfortable with.
Anant Agarwal, president of edX, the massively open online course (MOOC) platform created by Harvard and MIT, described big data as a “particle accelerator” for learning. Noting that edX has students in every country in the world, MOOCs can provide interesting insights into how students learn and how they interact with peers. He described data that showed how students over time began tackling homework prior to lectures, and suggested that data could eliminate subjective guesswork in education. The challenge is that a lot of the data benefits from education can only be derived through information sharing, yet adequately protecting individual student information can be challenging. Mr. Agarwal noted that his daughter used the same username on edX as she does on Facebook. “We can omit that information,” but students use also identifiers in forums and in other formats, he said. “We’d like to share the data we get with everyone,” he said, but he wondered how that could be done safely. “What is de-identification?” he asked.

When the floor was opened for questions, a skeptic in the crowd noted that one of the biggest drivers of data collection is not social benefits, but rather to make money. Mr. Agarwal suggested that was the very reason edX was a non-profit was in order for its use of sensitive data “to be judged by different criteria than maximizing return on investment.”

Secretary of Commerce Penny Pritzker suggested that harnessing the potential of data would hinge upon user trust. Highlighting Commerce’s efforts to advance multistakeholder codes of conduct and ensure the efficacy of the U.S.-EU Safe Harbor, Ms. Pritzker suggested government needed to continually evaluate and work with companies to uncover the technologies and practices that promote trust. She expressed hope that efforts like the day’s workshop could help to show that confidence placed in American companies should remain “rock solid.”

The program’s afternoon shifted to a broad discussion of privacy enhancing technologies (PETs), specifically developments in differential privacy, encryption, and accountability systems. There was a recognition that with any computer system that compromises in security and privacy are inevitable — complex software will have bugs, many different people will need access to infrastructure, and interesting data processing will require the use of confidential information or PII.

Danny Weitzner lamented a better definition of privacy for computer designers and engineers to build toward. Alan Westin’s original definition, that privacy is a claim by an individual, groups, or institutions to determine for themselves when, how, and to what extent their information can be communicated to others, has “led us astray,” Prof. Weitzner argued. He argued that throughout the day multiple substantive definitions of privacy had come up in discussion, and he argued that we “need a way to know what’s going on” in order to “allow data for some purposes, but won’t be misused for others.”

Quoting Judge Reggie B. Walton about the challenges facing the FISA Court, Weitzner noted that “we don’t currently have the tools in everyday systems to assess how information is used.” Weitzner discussed his work on information accountability.

Weitzner then led a large hypothetical discussion where MIT in the near-future “embrace[s] the potential of data-powered, analytics-driven systems in all aspects of campus life, from education to health care to community sustainability.” Weitzner asked a slate of panelists what they would do as the future chief privacy officer of MIT, and Intel’s David Hoffman suggested that we all need to understand “that a lot of the data about us that’s now out there is not coming from us.” As a result, meaningful transparency must mean more than notice to individuals. Panelists then hit a wide-gamut of issues from the ethical challenges around predictive analysis and the need to get serious about addressing questions about use, teeing up the Administration’s next workshop on the ethics of big data.