DCU & FPF Webinar – The Independent and Effective DPO: Legal and Policy Perspectives

On July 8th, Dublin City University (DCU) and the Future of Privacy Forum (FPF) jointly organized the webinar “The Independent and Effective DPO: Legal and Policy Perspectives.” The webinar was designed to help policymakers, regulators, and their staff better understand legal views concerning the position of the Data Protection Officer within an organization. The first half of the discussion centered around the involvement and independence of the DPO from the perspective of European data protection regulators, while the second half of the webinar explored how DPOs perceive their role. Guest speakers included European data protection regulators and DPOs from leading companies. 

To learn more, watch the recording on our YouTube channel.

New FPF Study: More Than 250 European Companies are Participating in Key EU-US Data Transfer Mechanism

Co-Authored by: Drew Medway & Jeremy Greenberg

European Companies’ Participation in Privacy Shield Up Nearly 30% from the Past Year.

EU-US Privacy Shield Remains Essential to Leading European Companies.

From Major Employers such as Logitech and Siemens to Leading Technology Firms like Telefónica and SAP, European Companies Depend on the EU-US Agreement.

The Privacy Shield Program Supports European Employment While Adding to Employee Data Protections—Nearly One-Third of Privacy Shield Companies Rely on the Framework to Transfer HR Information of European Staff.

With the future of the US/EU Privacy Shield framework awaiting the Court of Justice of the European Union’s (CJEU) Schrems II decision, the Future of Privacy Forum conducted a study of the companies enrolled in the cross-border privacy program and determined that 259 European headquartered companies are active Privacy Shield participants. This is a nearly 30% increase from last year’s total of 202 EU companies in the data transfer framework. These European firms rely on the program to transfer data to their US subsidiaries or to essential vendors that support their business needs. Nearly one-third of Privacy Shield companies use the mechanism to process human resources data—information that is crucial to employ, pay, and provide benefits to workers.

Thousands of major companies, many of which are headquartered or have offices in Europe, rely on the protections granted under the data transfer agreement. With a majority of companies surveyed in a recent IAPP study relying on Privacy Shield to transfer data out of the EU, and dozens of new companies joining each week to retain and pay their employees or create new job opportunities in Europe, the agreement is an integral data protection mechanism for European consumers and companies and the European marketplace as a whole.

The Numbers:

Overall, FPF found that more than 5,400 companies have signed up for Privacy Shield since the program’s inception – more than 1,000 participants joined in the last year.

Leading European companies that rely on Privacy Shield include:

–     ALDI, German grocery market chain

–     Eaton Corporation, Irish multinational management company

–     Ingersoll-Rand, Irish globally diversified industrial company

–     Jazz Pharmaceuticals, Irish biopharmaceutical company

–     Lidl, German grocery market chain

–     Logitech, Swiss computer peripherals manufacturer and software developer

–     SAP, German multinational software corporation

–     Siemens, German computer software company

–     TE Connectivity, Swiss consumer electronics company

–     Telefónica, Spanish mobile network provider

FPF research also determined that more than 1,700 companies, nearly one-third of the total number analyzed, joined Privacy Shield to transfer their human resources data.

The research identified 259 Privacy Shield companies headquartered or co-headquartered in Europe. Top EU locations for Privacy Shield companies include Germany, France, the Netherlands, and Ireland.  This is a conservative estimate of companies that rely on the Privacy Shield framework—FPF staff did not include global companies that have major European offices but are headquartered elsewhere. The 259 companies include some of Europe’s largest and most innovative employers, doing business across a wide range of industries and countries. EU-headquartered firms and major EU offices of global firms depend on the Privacy Shield program so that their related US entities can effectively exchange data for research, to improve products, to pay employees and to serve customers. 

The conclusions follow previous FPF studies, which highlighted similar increases in participation and reliance by EU firms on the Privacy Shield program over time.  

Methodology:

Note Regarding Brexit: Given the 130-plus UK companies reliant on Privacy Shield, we encourage the continued enforcement of the framework with the UK after the conclusion of the UK-EU Transition Period on December 31st, 2020. Companies reliant on the transfer of data between the UK and US would be wise to review the Department of Commerce’s Privacy Shield and the UK FAQs for guidance on UK-US data transfer during, and after, the Transition Period.

For the full list of European companies in the Privacy Shield program, or to schedule an interview with Jeremy Greenberg, John Verdi, or Jules Polonetsky, email [email protected].

For more information about FPF in Europe, visit https://fpf.org/eu.

Off to the Races for Enforcement of California’s Privacy Law

Yesterday, the California Attorney General’s office confirmed that it has begun sending a “swath” of enforcement notices to companies across sectors who are allegedly violating the California Consumer Privacy Act (CCPA), swiftly beginning enforcement right on the July 1st enforcement date. The law came into effect in January, after years of debate and amendment in the California Legislature. Additional proposed regulations, intended to clarify and operationalize the text of the statute, are not yet final.

In an IAPP-led webinar, “CCPA Enforcement: Enter the AG,” Stacey Schesser, California’s Supervising Deputy Attorney General, confirmed details about the first week of CCPA enforcement. Below, we provide 1) key takeaways from that conversation; 2) discuss the role of the draft regulations; and 3) observe that the successes or failures of AG enforcement will directly influence debates over other legislative efforts outside of California. Meanwhile, AG enforcement will almost certainly bolster public awareness and support for the California Privacy Rights Act (CPRA) or “CCPA 2.0” ballot initiative in November 2020.

 

Key takeaways and observations:

Based on Deputy AG Schesser’s comments, we know that active enforcement of the CCPA began immediately on July 1st, with the office sending violation notice letters to a “swath” of online businesses. Under the law, companies have a thirty-day period to “cure” violations and come into compliance. As a result, these letters are unlikely to become public, unless any of them progress into full-blown investigations.

We do know a few key things from this discussion, however, about the type and substance of the alleged violations under scrutiny. 

For example, we know that online businesses from “across sectors” were targeted, rather than, for example, retail or other “brick and mortar” establishments that collect data in-person. And although it was not directly stated, it was implied that the violations involve perceived failures to comply with the law’s “Do Not Sell” provisions. The AG has publicly held up this specific consumer right to request that a business not sell data as the most central feature of the CCPA. As a result, major online companies or publishers that do not provide a link entitled “Do Not Sell My Information” may be under particular scrutiny. 

We don’t know at this point whether the AG staff identified obvious cases where observation made it clear a company was selling data. In many cases the issue of whether data that is transmitted to third parties is a sale depends on contracts and commitments made by those parties, details that can be challenging to discern based on external observation. Some companies may use the thirty-day cure period to attempt to persuade the AG’s office that their data sharing is occurring within the context of a service provider relationship or another permissible exemption that allows them to not provide a “Do Not Sell” button.

Deputy AG Schesser also confirmed that businesses were targeted based on consumer complaints and even some reports on Twitter. It would not be surprising to see that early enforcement targets were influenced by media and Twitter reports of businesses that do or do not provide a “Do Not Sell My Information” link. For example, a February 2020 Washington Post article includes a comprehensive list of top companies and notes whether they provide CCPA-related links.

For companies still interpreting and operationalizing the AG’s regulations, Deputy AG Schesser’s comments yesterday confirmed that enforcement (for now) is limited to the text of the statute. Although the CCPA has been in effect since January 1, 2020, the additional regulations promulgated by the AG’s office are not yet finalized, with the final text of the proposed regulations under review by the Office of Administrative Law.

Despite this, it would be wise for companies to carefully review the proposed regulations. Although in some cases the draft regulations appear to create new obligations or restrictions that do not exist in the text of the CCPA — such as disclosures for large data holders — in many cases the regulations are intended to clarify existing law. In such cases, the regulations provide a useful window into how the AG’s office understands the text of the CCPA. Similarly, companies seeking to understand how the AG’s office understands the CCPA and its “Do Not Sell” provision can look to the 900+ pages of responses given to commenters in the public comment periods for the draft regulations. These responses provide important insight into the AG’s analysis of what the underlying statute requires.

The role of State Attorneys General (AGs) in enforcing comprehensive privacy laws has been at the heart of many recent debates over both state and federal legislation. For example, in deliberations regarding the Washington Privacy Act (WPA), enforcement emerged as one of the most divisive issues that led to the bill failing to pass the Washington House. Advocates and even the Washington Office of the Attorney General itself argued that the Washington AG lacked the financial and other resources to meaningfully enforce the law if it were passed, and that the law needed to also include a private cause of action for individuals to bring claims directly in court.

In the context of federal legislation, it is becoming increasingly common for proposed comprehensive privacy legislation from both Democrats and Republicans to include enforcement powers for State AGs. Industry groups sometimes argue against the inclusion of State AGs, perceiving their enforcement to be politically motivated or observing that they may lack the deep expertise of their federal agency counterparts to enforce privacy laws affecting complex emerging technologies and digital platforms. However, State AGs will almost certainly play some role in a future federal privacy law, particularly if stronger government enforcement becomes part of a compromise against a robust private cause of action.

Despite these criticisms, we see this week that State AGs can act quickly and decisively. This is in line with the growing national importance of State AGs in enforcing against novel privacy harms associated with emerging technologies (for more, see Professor Danielle Citron’s 2017 exploration of The Privacy Policymaking of State Attorneys General). If the California AG’s enforcement letters and investigations over the next six months are perceived as effective, it will continue to bolster the credibility of AGs as primary enforcers of state laws, and supplementary enforcers of a federal law.

Next up: CPRA Ballot Initiative (“CCPA 2.0”)

Meanwhile, the proposed “California Privacy Rights Act” (CPRA) has qualified for the November 2020 ballot, and if passed would modify the CCPA to provide additional consumer protections. For example, it would add the consumer right to “correct inaccurate information,” and the right to limit first-party use of sensitive categories of information (rather than only being able to limit its sale). It would also provide much-needed clarifications on the consumer right to opt out of all sale or sharing of data for purposes of online behavioral advertising, and enshrine a clearer “purpose limitation” obligation into the text of the statute.

If passed, the CPRA will likely become the new de facto minimum U.S. national standard for consumer privacy, raising the bar significantly for efforts to pass federal legislation. Despite its detailed requirements, it is not finding favor with some civil society groups such as the Consumer Federation of California, which has now formally opposed the initiative. On the other hand, Common Sense Media has now endorsed the effort. The ballot initiative process in California enables groups to submit ballot arguments in support or opposition of an initiative, which may be important to help voters understand the initiative, so stay tuned for news of additional groups that support or oppose the effort.


Author: Stacey Gray is an FPF Senior Counsel and leads FPF’s U.S. federal and state legislative analysis and policymaker education efforts. Did we miss anything? Email us at [email protected].

 

Image Credit: Tweet from Attorney General Becerra, @AGBecerra, Twitter, July 1, 2020, https://twitter.com/AGBecerra/status/1278377943803154432?s=20.

Strong Data Encryption Protects Everyone: FPF Infographic Details Encryption Benefits for Individuals, Enterprises, and Government Officials

Today, the Future of Privacy Forum released a new tool: the interactive visual guide “Strong Data Encryption Protects Everyone.” The infographic illustrates how strong encryption protects individuals, enterprises, and the government. FPF’s guide also highlights key risks that arise when encryption safeguards are undermined – risks that can expose sensitive health and financial records, undermine the security of critical infrastructure, and enable interception of officials’ confidential communications.

Encryption is a mathematical process that scrambles information; it is often used to secure or authenticate sensitive documents. Each use of encryption generates a very long number that is the mathematical solution to the formula and can unscramble the protected sensitive information. This “key” must be kept a secret or anyone who has it will be able to access the information.

Strong encryption employed by online services and connected devices protects sensitive communications and stored data. Strong encryption protects individuals when they use a smartphone, browse the web, bank online, or send electronic messages to family and colleagues. Encryption can also help authenticate credentials, preventing criminals from impersonating users, viewing confidential email and photos, emptying bank accounts, or taking control of power plants and vehicles.

Strong Data Encryption Protects Everyone demonstrates how encryption technologies protect data transmitted by smart medical devices, classroom records stored by teachers, and communications between soldiers. The infographic cites concrete examples of the ways individuals, schools, and military officers routinely use strong encryption to protect sensitive information.

FPF’s infographic highlights the protections provided by strong encryption as well as the risks presented by weakening encryption. Weakened encryption creates risks to health, safety, and privacy, and increases the possibility that criminals will infiltrate systems or intercept data. Encryption can be weakened by unintentional software flaws or by intentional decisions to provide some entities with exceptional access. There is no practical way to distinguish between public sector vs. private sector encryption; government officials typically use the same commercial software and hardware employed by corporations and individuals.

Encryption Benefits Infographic Fpf 1 Pdf

Dr. Rachele Hendricks-Sturrup Discusses Trends in Health Data

We’re talking to FPF senior policy experts about their work on important privacy issues. Today, Dr. Rachele Hendricks-Sturrup, Health Policy Counsel, is sharing her perspective on health data and privacy.  

Dr. Hendricks-Sturrup has more than 12 years of experience in healthcare and biomedical research, health journalism, and engagement with digital health companies and startups. She has been a Research Fellow in the Department of Population Medicine at the Harvard Pilgrim Health Care Institute and Harvard Medical School and, as FPF’s health lead, continues to address ethical, legal, and social issues at the forefront of health policy and innovation, including genetic data, wearables, and machine learning with health data. She works with stakeholders to advance opportunities for data to be used for research and as real world evidence to improve patient care and outcomes and support evidence-based medicine.

What first attracted you to working on the privacy implications of health data?

My interest in health data began many years ago. When I finished my undergraduate degree in 2007, the world was turning on its head as far as understanding the utility and value of data. I realized that data is really running the show in academic, healthcare, business, and so many other landscapes. For this reason, I felt compelled to advocate for uses of data in ways that are purposeful, meaningful, and intentional to address specific problems in health care and health science research.

That being said, my appreciation for data and its potential, even if data might be considered a double-edged sword in some ways, has increased and I have studied how data can be used to support health care decision-making in meaningful ways.

You said that “data is running the show.” What does that mean?

Everyone in health care and policy today appreciates the value of evidence – evidence-based medicine, evidence-based policy, precision medicine, precision policy… you name it. Health data drives payment incentive structures in health services and approvals for new drugs and medical devices. Any kind of data point that you can use or leverage – be it quantitative or qualitative – to make an important decision or program is immensely valuable. With that being said, we still have a long way to go as far as figuring out how health data, especially when it’s combined with other data, should be collected and used to create actionable intelligence needed to robustly inform policy and organizational decision-making.

New types and uses of data in the health space mean that it’s really important to get the data right. If whatever you’re doing isn’t based on evidence or reliable data, then you’re likely to see a failed policy due to poor or absent evidence. Data is key to decision making in clinical care, legal settings, policy, and in research; if it’s not there or isn’t collected for intentional reasons or purposes, then how can you back up your claims?

What types and sources of data do you find yourself thinking about a lot these days?

I actually think a lot about behavioral data – mainly how that data is being combined with other types of personal data like geolocation data, genetic data, and consumer health data. Combined data is of strong interest to many, if not all, health companies and other business verticals engaged in health care. For this reason, I think it is important to be forethinking about immediate and downstream uses of such data to safeguard against possible population- or group-level discrimination, especially against populations sharing a certain health or immutable characteristic. Users and generators of that data should foremost consider how that data can be leveraged to improve health behavior in a non-coercive, fair, and transparent way. For example, I think that if I can help guide practitioners and researchers toward thinking more deeply and strategically about why and how they collect and use health data, then there is a greater chance that the data will not be used to stigmatize patients’ or health consumers’ behavior, but instead help those patients and consumers leverage resources within or new to their environments to become better stewards of their own health.

Another example is data collected to demonstrate patient medication adherence. If this sort of data is quantitative (or numbers-driven) and used or interpreted in potentially discriminatory ways to justify increased cost-sharing for a high-risk patient group, such as by stating, “these people are very non-adherent, so if we don’t see the clinical outcome that we’re looking for, then it’s their fault and they should pay instead of their insurer,” then I would argue that there are smarter ways to use that data.

One smart way to use that data is to combine it with qualitative data that can bring context as to why we see poor quantitative medication adherence data. Humans should step in to then ask – do these patients lack access to transportation and needed to refill their prescription? What can we do to eliminate that barrier for those patients? If a patient isn’t opening a smart pill bottle, is it because they forgot? What can we do to remind them or help them remind themselves? Those qualitative data points can supplement quantitative data to ensure all of the data are used to their highest potential to address or solve an actual (versus assumed) problem.

What have you been working on at FPF?

At FPF, I lead the health working group and engage with our working group and advisory board members on and to develop various projects that can inform FPF best practices. I’m currently engaging FPF’s health working group alongside our CEO, Jules Polonetsky, and Policy Fellow Katelyn Ringrose, to create draft guidance around best privacy practices for de-identified data that is shared by and with HIPAA-covered entities. The working group includes many stakeholders from academia and industry and thus attempts to garner consensus about best privacy practices across a wide range of stakeholders that present a range of use cases.

Currently, for FPF’s Privacy and Pandemics project, I help drive new and existing focuses on privacy concerns around the use of tech-driven surveillance measures, like digital contact tracing and thermal imaging, to help contain the spread of the virus and restart economies.

What do you see as the big, rising issues around uses of health data over the next few years?

It’s clear that data is critical to understanding and addressing many health challenges, including the pandemic, but certain uses of data also create a wide range of risks. We will be providing guidance around digital contact tracing and a wide range of additional technologies that are being used or proposed. Some may not be effective, some may create major risks, and others may be useful, if deployed in a proportionate and measured manner with appropriate safeguards.

I also think questions about how we collect or do not collect data to address health disparities exacerbated or exposed by COVID-19 will continue to resonate. The public has become much more exposed to toll of health disparities within the United States and across the world, especially for certain minority or low-income populations, following COVID-19. The biggest challenge to all of this will be how we collect and leverage data to actually address these disparities and help control or prevent them in the future.

Similarly, with regard to issues around social justice, racism, and police brutality that have drawn the attention of prominent health care organizations like the American Medical Association, I see the need to determine how intentional data collection, use, and interpretation fit within that equation to address these issues. Also, and broadly, how can we or are we using data to resolve conflicts or concerns in health care, genetics, and other areas? I foresee this as our greatest challenge – if we’re able to meet it, then we can say that we’ve made a quantum leap.

iOS Privacy Advances

Law and legislation take the lead in setting standards for protecting personal data, but the policies and norms established by companies also play a central role. This has been the case particularly for global platforms providing the services used by billions in the course of daily life. Apple’s 2020 Worldwide Developer Conference (WWDC) previewed a variety of privacy advances coming soon to various Apple products, including their most recent mobile operating system, iOS 14. Apple executives reiterated throughout the event that, “privacy is a fundamental human right at the core of everything we do” and a notable portion of their announcements directly reflect this perspective.

An overview of some of the notable changes include:

App Tracking Controls

As of iOS 14, Apple will require developers to obtain consent prior to tracking users across apps and websites owned by other companies, including tracking by user ID, IDFA, device ID, fingerprinting, or profiles. Apps are currently only permitted to track users for advertising purposes by using the iOS IDFA (Identifier for Advertisers) — an ID that is available by default and can be “zeroed out” by enabling the Limit Ad Tracking setting. In iOS 14, apps will need to affirmatively request access to the IDFA via the AppTrackingTransparency framework.

This new “just-in-time notification will provide users with two options: “Allow tracking” or “Ask app not to track.”

  iOS 14 Tracking Permission Screenshot   iOS 14 Limit Ad Tracking Screenshot

 

It is our understanding that apps can ask for this permission only once and cannot discriminate against users by restricting use of the app or key features for those who decline the permission.

The new functionality will also allow users to view and edit this setting for each app on the device within Settings. Users can also choose not to be asked for permission to track by all apps through the Limit Ad Tracking toggle within Settings. This setting is automatically disabled for child accounts and on shared iPads.

Apps will still be permitted to track users who do not “Allow Tracking” in circumstances where the app’s data is linked to third-party data solely on the user’s device and is not sent off  of the device in a way that can identify the user or device. App data may also be sent to a third-party if that third-party partner uses the data solely for fraud detection, fraud prevention, or security purposes and solely on behalf of the app developer (for example, to prevent credit card fraud).

Attributing App Installations

Apps will be able to leverage Apple’s SKAdNetwork to attribute app installations via aggregate conversion reporting. Ad networks will need to register and provide campaign IDs to the App Store, and Apple will report the aggregate results of campaigns driving installations without sharing information about individual users.

Location Updates

Currently, users are presented with several options when an app requests access to location. These options include 1) Allow While Using App, 2) Allow Once, and 3) Don’t Allow.

A new, more privacy-preserving option will offer users the ability to share an approximate location that reflects an area of approximately 10 square miles, allowing for a personalized experience without sharing precise location information.

To help users understand the difference and the precision of location sharing, iOS will now provide a visual representation as part of the location permission dialogue. The option to share precise location with individual apps can also be managed with the device Settings.

.  iOS 14 Location Permission Prompt iOS 14 Precise Location On Screenshot       App Location Setting Screenshot App Precise Location Setting Screenshot

Simplified App Privacy Notices

Apps are already required to provide a link to a privacy policy under current App Store requirements. With iOS 14, apps will need to provide specific information available for users to review prior to installation in a standardized format, similar to a nutrition label, within the App Store interface. Developers will be required to complete a questionnaire detailing what data the app collects, how the data is used, if the data is linked to a particular user or device, and if the data will be used to track users. Because SDK’s run in-process with other app code, and share the app’s access permissions, developer responses are required to reflect both the practices of the app as well as any 3rd party code within the app.

Since these declarations are effectively additional privacy disclosures provided by an app, companies will need to take care to ensure they are accurate legal representations of their practices.

App Store Privacy Snapshot (1).   App Store Privacy Snapshot (2)

Microphone & Camera Indicators

Indicators will appear on the status bar and within the Control Center when the camera or microphone is activated by an app.

Camera Activated Indicator      Microphone Activated Indicator

Sign-In With Apple Upgrades

Developers will now be able to offer users the option to convert existing app accounts to Sign-in with Apple, which is tied to users’ Apple IDs.

Safari Enhancements

As of iOS 14, Safari will support a broader selection of extensions distributed through the App Store. Just-in-time notices within Safari will notify users when an extension accesses information about a site the user is visiting, and users can opt to “Allow for one day,” Always allow on this website,” or “Always allow on every website.”

Safari Extension Permission Prompt

 

While ITP (Intelligent Tracking Protection) has been implemented in Safari for several years, new transparency enhancements will allow users to review a list of specific trackers blocked on a website through an icon in the toolbar.

Safari Tracker Notice

 

In addition, a new “Privacy Report” will provide users with a summary of all cross-site trackers that have been blocked within the previous 30 days.

Safari Tracker Privacy Report

Photo and Contacts Library Access

New access limits to the Photo Library will enable users to share only specific, selected items with an app, as opposed to the previous default of providing ongoing access to the user’s entire library. Similar technical controls will also be applied to apps interacting with a user’s contacts, allowing users to select individual contacts instead of providing apps with ongoing, blanket access to all of the user’s Contacts.

iOS 14 Photo Access Options Prompt

Network Access

While some apps need to be able to find and connect to local devices on a network in order to provide specific, related services, some apps have used access to this information to track users for other purposes. Apps will now be required to prompt users and obtain permission for such access. In addition “Use Private Address” will be enabled by default. This will result in Apple devices providing WiFi networks with a MAC address that is uniquely-generated daily, preventing multiple WiFi networks from correlating the behaviors of an individual user presenting with the same MAC address.

iOS 14 WiFi Privacy Settings      iOS 14 App Network Connection Prompt

Clipboard Data

If text that has been copied to the clipboard is accessed by an app, iOS 14 will provide a notification to the user in a “Call Out” for each instance the app accesses the information, allowing users to know when and which apps are accessing the text stored on the clipboard. Previously, apps were able to access this information on demand and without any indication to users. Researchers recently flagged TikTok and other apps accessing clipboard data in this manner. Although some apps may have legitimate reasons to access clipboard data, the access by many others raises concerns.

iOS 14 Pasteboard Access Notice

 

Apple provided more detail about the changes and potential impact on existing and new apps in these videos and forums. The following two videos, in particular, provide more information regarding a number of changes which serve to encourage data minimization, reduce the likelihood of apps over-sharing data, and increase user transparency and control.

 

 

FPF Webinar Explores the Future of Privacy-Preserving Machine Learning

On June 8, FPF hosted a webinar, Privacy Preserving Machine Learning: New Research on Data and Model Privacy. Co-hosted by the FPF Artificial Intelligence Working Group and the Applied Privacy Research Coordination Network, an NSF project run by FPF, the webinar explored how machine learning models as well as data fed into machine learning models can be secured through tools and techniques to manage the flow of data and models in the ML ecosystem.  Academic researchers  from the US, EU, and Singapore presented their work to attendees  from around the globe. 

The papers presented, summarized in the associated Privacy Scholarship Reporter, represent key strategies in the evolution of private and secure machine learning research. Starting with her co-authored piece, Ms. Patricia Thaine of the University of Toronto outlined how combining privacy enhancing  techniques (PETs) can lead to better, perhaps almost perfect, preservation of privacy for the personal data used in ML applications. The combination of techniques, including deidentification, homomorphic encryption, and others, draws on foundational and novel work in private machine learning, such as the use of collaborative or federated learning, pioneered by the next presenter, Professor Reza Shokri of the National University of Singapore.  

Professor Shokri discussed data privacy issues in machine learning with a specific focus on “indirect and unintentional” risks such as may arise from metadata, data dependencies, and computations of data. He highlighted that reported statistics and models may provide sufficient information to allow an adversary to make use of the “inference avalanche” to link non-private information back to personal data.  He elaborated on the point that models themselves may be personal data which need to be protected under data protection frameworks, such as the GDPR.  To address these privacy risks from machine learning, Professor Shokri and his colleagues have developed the ML PrivacyMeter, which is an opensource tool available on GitHub. “ML Privacy Meter.. provides privacy risk scores which help in identifying data records among the training data that are under high risk of being leaked through the model parameters or predictions.”

Next, the foundational work on federated learning was applied to the domain of healthcare narrative data by Mr. Sven Festag, a speaker from Jena University Hospital who explored how collaborative privacy-preserving training of models that de-identify clinical narratives improves on the privacy protections available from de-identification run at a strictly local level.  One of the reasons for the approach taken by Mr. Festag was the potential for malicious actors seeking detailed knowledge of models or data, a threat further assessed by the following speaker, Dr. Ahmed Salem from Universitaat Saarland. 

Dr. Salem and his co-authors explored how machine learning models can be attacked and prompted to give incorrect answers through both static and dynamic triggers.  Triggers, such as replacement of pixels in an image, can be designed to prompt machines to learn something or recommend something different from the data that does not include such triggers.  When attack triggers are made dynamic, such as through use of algorithms trained to identify areas of vulnerability, the likelihood of a successful attack markedly increases. 

Given the possibility that attacks against models could cause adverse outcomes, it seems likely to expect consensus that machine learning models need to be well protected. However, as Dr. Sun and his Northeastern University co-authors found, many mobile app developers do not, in fact, design or include sufficient security for their models. Failure to appropriately secure models can cost companies considerably, from reputational harm  to financial losses. 

Following all the  presentations, the speakers joined FPF for a joint panel discussion about the general outlook for privacy preserving machine learning.  They concur that the future of privacy in machine learning will necessarily include both data protections and model protections and will need to go beyond a simple compliance-focused effort.  As a last question, speakers were asked their views on the most salient articles for privacy professionals to read on this topic. Those papers are listed below. 

Information about the webinar, including slides from the presentation, some of the papers and GitHub links for our speakers, and a new edition of the Privacy Scholarship Reporter, can be found below.

APRCN – Privacy Scholarship Research Reporter #5

Papers and other research artifacts presented during this webinar:

Patricia Thaine: 

Reza Shokri:

Sven Festag:

Ahmed Salem:

Zhichuang Sun:

Papers recommended by our speakers:

Adversarial examples are not easily detected: Bypassing ten detection methods

Stealing Machine Learning Models vis Prediction APIs

Designing for trust with machine learning

Differential privacy: A primer for a non-technical audience

Related FPF materials:

Digital Data Flows Masterclass

 

Privacy Scholarship Research Reporter: Issue 5, July 2020 – Preserving Privacy in Machine Learning: New Research on Data and Model Privacy

Notes from FPF

In this edition of the “Privacy Scholarship Reporter”, we build on the general knowledge from the first two and then explore some of the technical research being conducted to achieve ethical and privacy goals. 

Is it possible to preserve privacy in the age of AI?”, is a provocative question asked by academic researchers and company based researchers. The answer depends on a mixture of responses that secure privacy for training data and input data, and preserve privacy and reduce the possibility of social harms which arise from interpretation of model output.  Likewise, preservation of data privacy requires securing AI assets to ensure model privacy (Thain and Penn 2020, Figure 1). As machine learning and privacy researchers’ work demonstrates, there are myriad ways to preserve privacy and choosing the best methods may vary according to the purpose of the algorithm or the form of machine learning built. 

Within the growing body of research addressing privacy considerations in artificial intelligence and machine learning, there are two approaches emerging — a data centered approach and a model centered approach.  One group of methods to secure privacy through attention to data is “differential privacy”. Differential privacy is a type of mathematical perturbation introduced into a data set to ensure that a specific individuals’ inclusion in that data cannot be detected when summary statistics generated from either a true or differentiated data set are compared to one another. Some of the other methods for increasing the privacy protections for data prior to use in ML models include: homomorphic encryption, secure multi-party computation, and federated learning.  Homomorphic encryption preserves data privacy through analysis of encrypted data.  Secure multi-party computation is a protocol for collaboration between parties holding information they prefer to keep private from one another without intervention of a trusted third party actor.  Federated learning allows data to be stored and analyzed locally through models or segments of models sent to the user’s device.  

Data centered methods to preserve privacy introduce some possibility that models may be compromised, allowing for an attack on user’s data and on a company’s models.  Conversely, research to design model centered methods to preserve privacy presently focus on ways to secure models from attack. There are two general forms of attack against models that could reduce privacy for the developers of the models or for those whose data is used by the attacked model.. “Black box” attacks against machine learning draw private information from machine learning models through malicious gains of functional access to a model without knowing its internal details, while “white-box” attacks gain access to information about an individuals contribution through use of ill-gained knowledge of details about the model itself.  Both represent risks to individual data privacy that arise from challenges to a company’s model privacy.  The costs from loss of privacy through machine learning may redound  to individuals from privacy breaches or to companies from the associated cost from loss of proprietary assets and reputation, and can range into the millions of dollars. Working to reduce those losses is an important component of present privacy and machine learning research, such as is represented in the papers below.

As always, we would love to hear your feedback on this issue. You can email us at [email protected].

Sara Jordan, Policy Counsel, FPF


Preserving Privacy in Artificial Intelligence and Machine Learning: Theory and Practice

 

Perfectly Privacy-Preserving AI: What is it and how do we achieve it?

P. THAINE

Perfect preservation of privacy in artificial intelligence applications will entail significant efforts across the full lifecycle of AI product development, deployment, and decommissioning, focusing on the privacy protections implemented by  both data creators and model creators. Preservation of privacy would entail focusing on:  

1. Training Data Privacy: The guarantee that a malicious actor will not be able to reverse-engineer the training data. 

  1. Input Privacy: The guarantee that a user’s input data cannot be observed by other parties, including the model creator. 
  2. Output Privacy: The guarantee that the output of a model is not visible by anyone except for the user whose data is being inferred upon. 
  3. Model Privacy: The guarantee that the model cannot be stolen by a malicious party”. 

A combination of the tools available may represent the best, albeit still theoretical, paths toward perfectly privacy-preserving AI.

Authors’ Abstract

Many AI applications need to process huge amounts of sensitive information for model training, evaluation, and real-world integration. These tasks include facial recognition, speaker recognition, text processing, and genomic data analysis. Unfortunately, one of the following two scenarios occur when training models to perform the aforementioned tasks: either models end up being trained on sensitive user information, making them vulnerable to malicious actors, or their evaluations are not representative of their abilities since the scope of the test set is limited. In some cases, the models never get created in the first place. There are a number of approaches that can be integrated into AI algorithms in order to maintain various levels of privacy. Namely, differential privacy, secure multi-party computation, homomorphic encryption, federated learning, secure enclaves, and automatic data de-identification. We will briefly explain each of these methods and describe the scenarios in which they would be most appropriate. Recently, several of these methods have been applied to machine learning models. We will cover some of the most interesting examples of privacy-preserving ML, including the integration of differential privacy with neural networks to avoid unwanted inferences from being made of a network’s training data. Finally, we will discuss how the privacy-preserving machine learning approaches that have been proposed so far would need to be combined in order to achieve perfectly privacy-preserving machine learning.

“Perfectly Privacy-Preserving AI” by Patricia Thaine from Towards Data Science, January 1, 2020.

Privacy-Preserving Deep Learning

R. SHOKRI, V. SHMATIKOV

Privacy preservation in machine learning depends on the privacy of the training and input data, privacy of the model itself, and privacy of the models’ outputs.  Researchers have demonstrated an ability to improve privacy preservation in these areas for many forms of machine learning, but doing so for neural network training over sensitive data presents persistent problems. In this (now) classic article, the authors describe “collaborative neural network training” which “protects privacy of the training data, enabled participants to control the learning objective and how much to reveal about their individual models, and lets them apply the jointly learned model to their own inputs without revealing the inputs or the outputs”.  A collaborative architecture protects privacy by ensuring data is not revealed to third party, like an MLaaS provider,  passive adversary, or malicious attacker, and by ensuring that data owners have control over their data assets.  This is particularly useful in areas where data owners cannot directly share their data with third parties due to privacy or confidentiality concerns (e.g., healthcare).

Authors’ Abstract

Deep learning based on artificial neural networks is a very popular approach to modeling, classifying, and recognizing complex data such as images, speech, and text. The unprecedented accuracy of deep learning methods has turned them into the foundation of new AI-based services on the Internet. Commercial companies that collect user data on a large scale have been the main beneficiaries of this trend since the success of deep learning techniques is directly proportional to the amount of data available for training. Massive data collection required for deep learning presents obvious privacy issues. Users’ personal, highly sensitive data such as photos and voice recordings is kept indefinitely by the companies that collect it. Users can neither delete it, nor restrict the purposes for which it is used. Furthermore, centrally kept data is subject to legal subpoenas and extra-judicial surveillance. Many data owners—for example, medical institutions that may want to apply deep learning methods to clinical records—are prevented by privacy and confidentiality concerns from sharing the data and thus benefiting from large-scale deep learning. In this paper, we design, implement, and evaluate a practical system that enables multiple parties to jointly learn an accurate neural network model for a given objective without sharing their input datasets. We exploit the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously. Our system lets participants train independently on their own datasets and selectively share small subsets of their models’ key parameters during training. This offers an attractive point in the utility/privacy tradeoff space: participants preserve the privacy of their respective data while still benefiting from other participants’ models and thus boosting their learning accuracy beyond what is achievable solely on their own inputs. We demonstrate the accuracy of our privacy preserving deep learning on benchmark datasets.

“Privacy-Preserving Deep Learning” by Reza Shokri, Vitaly Shmatikov, October, 2015.

 

Chiron: Privacy-preserving Machine Learning as a Service

T. HUNT, C. SONG, R. SHOKRI, V. SHMATIKOV, E. WITCHELL

Machine learning is a complex process that is difficult for all companies who might benefit to replicate well. To fill the need that groups have for machine learning, some machine learning companies now provide machine learning development and testing as a service, similar to analytics as a service.  Machine learning as a service (MLaaS) is democratizing access to the powerful analytic insights of machine learning techniques. Increasing access to machine learning techniques corresponds to an increase in the risk that companies’ models, part of their intellectual property and corporate private goods, might be leaked to users of MLaaS. Uses of MLaaS also increase the risk to the privacy of individuals whose data is introduced into ML models; service platforms could be compromised by internal or external attacks.  This paper proposes a model that builds on Software Guard Extensions (SGX) enclaves, which limit untrusted platform’s access to code or data, and Ryoan (distributed sandboxes that separate programs from one another to prevent unintentional transfer or contamination).  Ryoan sandboxes confine code, allows it to define and train a model, while ensuring that the model does not leak data to untrusted parties, to build a MLaaS platform which protects both those providing ML services and those seeking services and supplying data. 

Authors’ Abstract

Major cloud operators offer machine learning (ML) as a service, enabling customers who have the data but not ML expertise or infrastructure to train predictive models on this data. Existing ML-as-a-service platforms require users to reveal all training data to the service operator. We design, implement, and evaluate Chiron, a system for privacy-preserving machine learning as a service. First, Chiron conceals the training data from the service operator. Second, in keeping with how many existing ML-as-a-service platforms work, Chiron reveals neither the training algorithm nor the model structure to the user, providing only black-box access to the trained model. Chiron is implemented using SGX enclaves, but SGX alone does not achieve the dual goals of data privacy and model confidentiality. Chiron runs the standard ML training toolchain (including the popular Theano framework and C compiler) in an enclave, but the untrusted model-creation code from the service operator is further confined in a Ryoan sandbox to prevent it from leaking the training data outside the enclave. To support distributed training, Chiron executes multiple concurrent enclaves that exchange model parameters via a parameter server. We evaluate Chiron on popular deep learning models, focusing on benchmark image classification tasks such as CIFAR and ImageNet, and show that its training performance and accuracy of the resulting models are practical for common uses of ML-as-a-service.

“Chiron: Privacy-preserving Machine Learning as a Service” by Tyler Hunt, Congzheng Song, Reza Shokri, Vitaly Shmatikov, and Emmet Witchel, March 15, 2018.

 

Privacy Preserving Machine Learning: Applications to Health Care Information

 

Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation

S. FESTAG, C. SPRECKELSEN

A common concern about privacy in machine learning systems is that the massive amounts of data involved represent a quantitatively definable risk that is over and above the risk to privacy when smaller datasets are used.  However, as researchers using small datasets of sensitive information attest, the risk may not lie in how much data is used but in how small or disparate pieces of data are aggregated for use.  Researchers studying collaborative learning models propose methods for securing data against leakage when it is gathered from multiple sources.  Forms of collaborative learning proposed as methods to improve integration of sensitive private information in neural network settings include “round robin techniques” and “privacy-preserving distributed selective stochastic gradient descent (DSSGD)”. DSSGD is a method for protecting private information through both use of local (on-device/ at site) training and individual restriction on the level of information shared back to a central model server.  

Authors’ Abstract

Background: Collaborative privacy-preserving training methods allow for the integration of locally stored private data sets into machine learning approaches while ensuring confidentiality and nondisclosure. Objective: In this work we assess the performance of a state-of-the-art neural network approach for the detection of protected health information in texts trained in a collaborative privacy-preserving way. Methods: The training adopts distributed selective stochastic gradient descent (ie, it works by exchanging local learning results achieved on private data sets). Five networks were trained on separated real-world clinical data sets by using the privacy-protecting protocol. In total, the data sets contain 1304 real longitudinal patient records for 296 patients. Results: These networks reached a mean F1 value of 0.955. The gold standard centralized training that is based on the union of all sets and does not take data security into consideration reaches a final value of 0.962. Conclusions: Using real-world clinical data, our study shows that detection of protected health information can be secured by collaborative privacy-preserving training. In general, the approach shows the feasibility of deep learning on distributed and confidential clinical data while ensuring data protection.

“Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation” by Sven Festag and Cord Spreckelsen, May 5, 2020.

 

An Improved Method for Sharing Medical Images for Privacy Preserving Machine Learning using Multiparty Computation and Steganography

R. VIGNESH, R. VISHNU, S.M. RAJ, M.B. AKSHAY, D.G. NAIR, J.R. NAIR

Preserving the security of data that cannot be easily shared due to privacy and confidentiality concerns presents opportunities for creativity in transfer mechanisms.  Sharing digital images, which due to their rich and pixelated nature cannot be easily encrypted and decrypted without loss, represents a particularly unique challenge and opportunity for creativity.  One way in which images can be shared more securely is by using secret sharing protocols, which require a combination of small pieces of secret information until a threshold of secret bits is compiled at which point a secret, such as a unique user key, is revealed and the full information rendered. Rich image data can also be used to transfer low dimensionality text data as an embedded component of the image. Steganography or insertion of small secret bits of data into an image without changing the perception of the image is one way to transfer information for use in secret sharing protocols. Uses of steganography can allow multiple machine learning uses of a single image in a privacy preserving way.

Authors’ Abstract

Digital data privacy is one of the main concerns in today’s world. When everything is digitized, there is a threat of private data being misused. Privacy-preserving machine learning is becoming a top research area. For machines to learn, massive data is needed and when it comes to sensitive data, privacy issues arise. With this paper, we combine secure multiparty computation and steganography helping machine learning researchers to make use of a huge volume of medical images with hospitals without compromising patients’ privacy. This also has application in digital image authentication. Steganography is one way of securing digital image data by secretly embedding the data in the image without creating visually perceptible changes. Secret sharing schemes have gained popularity in the last few years and research has been done on numerous aspects.

“An Improved method for sharing medical images for Privacy Preserving Machine Learning using Multiparty Computation and Steganography” by R. Vignesh, R. Vishnu, Sreenu M. Raj, M.B. Akshay, Divya G. Nair, and Jyothisha R. Nair, 2o19.

 

Model Centered Privacy Protections

 

Dynamic Backdoor Attacks Against Machine Learning Models

A. SALEM, R. WEN, M. BACKES, S. MA, Y. ZHANG

Machine learning systems are vulnerable to attack from conventional methods, such as model theft, but also from backdoor attacks where malicious functions are introduced into the models themselves which then express undesirable behavior when appropriately triggered. Some model backdoors use “static” triggers which could be detected by defense techniques, but the authors of this paper propose three forms of dynamic backdoor attacks. Dynamic backdoor attacks raise specific privacy concerns as these attacks allow adversaries access to both centralized and decentralized systems. Once given access, the backdoor attack with a dynamic trigger will cause a  model to misclassify any input. As a consequence, users may inadvertently adversely train machine learning models they rely upon. Likewise, backdoor attacked machine learning algorithms in users’ systems or devices may report users’ information without application of differential privacy techniques, thus compromising user privacy and personal information.

Authors’ Abstract

Machine learning (ML) has made tremendous progress during the past decade and is being adopted in various critical real-world applications. However, recent research has shown that ML models are vulnerable to multiple security and privacy attacks. In particular, backdoor attacks against ML models that have recently raised a lot of awareness. A successful backdoor attack can cause severe consequences, such as allowing an adversary to bypass critical authentication systems. Current backdooring techniques rely on adding static triggers (with fixed patterns and locations) on ML model inputs. In this paper, we propose the first class of dynamic backdooring techniques: Random Backdoor, Backdoor Generating Network (BaN), and conditional Backdoor Generating Network (c-BaN). Triggers generated by our techniques can have random patterns and locations, which reduce the efficacy of the current backdoor detection mechanisms. In particular, BaN and c-BaN are the first two schemes that algorithmically generate triggers, which rely on a novel generative network. Moreover, c-BaN is the first conditional backdooring technique, that given a target label, it can generate a target-specific trigger. Both BaN and c-BaN are essentially a general framework which renders the adversary the flexibility for further customizing backdoor attacks. We extensively evaluate our techniques on three benchmark datasets: MNIST, CelebA, and CIFAR-10. Our techniques achieve almost perfect attack performance on backdoored data with a negligible utility loss. We further show that our techniques can bypass current state-of-the-art defense mechanisms against backdoor attacks, including Neural Cleanse, ABS, and STRIP.

“Dynamic Backdoor Attacks Against Machine Learning Models” by Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, and Yang Zhang, March 7, 2020.

 

Mind Your Weights: A Large-Scale Study on Insufficient Machine Learning Model Protection in Mobile Apps

Z. SUN, R. SUN, L. LU

To protect privacy when using machine learning, many researchers or developers focus on securing the data of individuals whose interaction with the internet of things, mobile phones, and other data gathering devices powers much of machine learning. But, truly securing privacy in machine learning systems also means securing the models themselves. Securing models protects the privacy and security of the companies’ machine learning assets and protects users who could be subject to a higher risk for exposure due to inversion attacks by those who maliciously or surreptitiously gain access to models. This paper studies the methods that companies do and do not use when protecting machine learning in mobile apps. Importantly, these researchers also quantify the risk if  models are stolen, finding that the cost of a stolen model can run into the millions of dollars.  

Authors’ Abstract

On-device machine learning (ML) is quickly gaining popularity among mobile apps. It allows offline model inference while preserving user privacy. However, ML models, considered as core intellectual properties of model owners, are now stored on billions of untrusted devices and subject to potential thefts. Leaked models can cause both severe financial loss and security consequences. This paper presents the first empirical study of ML model protection on mobile devices. Our study aims to answer three open questions with quantitative evidence: How widely is model protection used in apps? How robust are existing model protection techniques? How much can (stolen) models cost? To that end, we built a simple app analysis pipeline and analyzed 46,753 popular apps collected from the US and Chinese app markets. We identified 1,468MLapps spanning all popular app categories. We found that, alarmingly, 41% of ML apps do not protect their models at all, which can be trivially stolen from app packages. Even for those apps that use model protection or encryption, we were able to extract the models from 66% of them via unsophisticated dynamic analysis techniques. The extracted models are mostly commercial products and used for face recognition, liveness detection, ID/bank card recognition, and malware detection. We quantitatively estimated the potential financial impact of a leaked model, which can amount to millions of dollars for different stakeholders. Our study reveals that on-device models are currently at high risk of being leaked; attackers are highly motivated to steal such models. Drawn from our large-scale study, we report our insights into this emerging security problem and discuss the technical challenges, hoping to inspire future research on robust and practical model protection for mobile devices.

“Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning Model Protection in Mobile Apps” by Zhichuang Sun, Ruimin Sun, and Long Lu, February 18, 2020.

 


Conclusion

Privacy and security in a machine learning enabled world involves protecting both data and models.  As the papers reviewed here show, this will require new ways of thinking about analysis of privacy in development and deployment of machine learning models. Whether in research on healthcare applications or mobile apps, researchers are pointing to these new ways of thinking and new techniques to improve privacy in machine learning.

California Privacy Legislation: A Timeline of Key Events

Authors: Katelyn Ringrose (Christopher Wolf Diversity Law Fellow) and Jeremy Greenberg (Policy Counsel) 

——-

Today, the California Attorney General will begin enforcing the California Consumer Privacy Act (CCPA). The California AG’s office may bring enforcement actions and seek penalties for violations of core provisions of the CCPA. The AG’s request for expedited review of regulations that supplement the CCPA is pending before the California Office of Administrative Law (OAL); the AG regulations provide additional detail regarding particular CCPA provisions.

The CCPA, which came into effect January 2020, is the first non-sectoral privacy law passed in the United States that contains broad consumer rights to access, delete, and opt out of the sale of their data. The CCPA has sparked major compliance efforts in the United States and globally, and the legislation was a long time in the making—beginning as a 2016 ballot initiative. 

In commemoration of today’s landmark enforcement date, we look back at the events that brought us to this point, and potential future ramifications. Below is a timeline of events regarding California privacy legislation from 2016 – 2020. This timeline examines the inception of the CCPA, the various amendments and lawsuits that have shaped its scope and enforcement provisions, and the current status of the California Privacy Rights Act (CPRA), or “CCPA 2.0,” recently certified for the 2020 ballot. If passed, CPRA would become effective in 2023. 

Download this Timeline

What’s next? If voters in California vote for the CPRA ballot initiative in the general election on November 3, 2020, the CPRA would become effective on January 1, 2023. The proposed law contains broader substantive protections than the CCPA, including data minimization and purpose limitation obligations, a stronger opt-out of behavioral advertising, and restrictions regarding the use of sensitive data. It would also establish a new Privacy Protection Agency in California to create additional regulations and to enforce the law. 

FPF Resources on CCPA //

Additional Reading //

Acknowledgements to Stacey Gray, Senior Counsel (US Legislation and Policymaker Education), Polly Sanderson, Policy Counsel. 

Did we miss any key events? Let us know at [email protected].

Commoditization of Data is the Problem, Not the Solution – Why Placing a Price Tag on Personal Information May Harm Rather Than Protect Consumer Privacy

This guest post is by Lokke Moerel, a Professor of Global ICT Law at Tilburg University and Senior of Counsel at Morrison & Foerster in Berlin, and Christine Lyon, partner at Morrison & Foerster in Palo Alto, California. To learn more about FPF in Europe, please visit https://fpf.org/eu.

By Lokke Moerel and Christine Lyon[1]

Friend and foe agree that our society is undergoing a digital revolution that is in the process of transforming our society as we know it. In addition to economic and social progress, every technological revolution also brings along disruption and friction.[2] The new digital technologies (and, in particular, artificial intelligence-AI) are fueled by huge volumes of data, leading to the common saying that “data is the new oil.” These data-driven technologies transform existing business models and present new privacy issues and ethical dilemmas.[3] Social resistance to the excesses of the new data economy is becoming increasingly visible and leads to calls for new legislation.[4]

Commentators argue that a relatively small number of companies are disproportionately profiting from consumers’ data, and that the economic gap continues to grow between technology companies and the consumers whose data drives the profits of these companies.[5] Consumers are also becoming more aware of the fact that free online services come at a cost to their privacy, where the modern adage has become that consumers are not the recipients of free online services but are actually the product itself.[6]

U.S. legislators are responding by proposing prescriptive notice and choice requirements which intend to serve the dual purpose of providing consumers with greater control over the use of their personal information and at the same time enabling them to profit from that use of their information.

An illustrative example is California Governor Gavin Newsom’s proposal that consumers should “share the wealth” that technology companies generate from their data, potentially in the form of a “data dividend” to be paid to Californians for the use of their data.[7] California’s Consumer Privacy Act (CCPA) also combines the right of consumers to opt out of the sale of their data with a requirement that any financial incentive offered by companies to consumers for the sale of their personal information should be reasonably related to the value of the consumer’s data.[8]

These are not isolated examples. The academic community is also proposing alternative ways to address wealth inequality. Illustrative here is Lanier and Weyl’s proposal for the creation of data unions that would negotiate payment terms for user-generated content and personal information supplied by their users, which we will discuss in further detail below.

Though these attempts to protect, empower, and compensate consumers are commendable, the proposals to achieve these goals are actually counterproductive. The remedy is here worse than the ailment.

To illustrate the underlying issue, let’s take the example of misleading advertising and unfair trade practices. If an advertisement is misleading or a trade practice unfair, it is intuitively understood that companies should not be able to remedy this situation by obtaining consent for such practice from the consumer. In the same vein, if companies generate large revenues with their misleading and unfair practices, the solution is not to ensure consumers get their share of the illicitly obtained revenues. If anything would provide an incentive to continue misleading and unfair practices, this would be it.

As always with data protection in the digital environment, the issues are far less straightforward than in their offline equivalents and therefore more difficult to understand and address. History shows that whenever a new technology is introduced, society needs time to adjust. As a consequence, the data economy is still driven by the possibilities of technology rather social and legal norms.[9] This inevitably leads to social unrest and calls for new rules, such as the call of Microsoft’s CEO, Satya Nadella, for the U.S., China, and Europe to come together and establish a global privacy standard based on the EU General Data Protection Regulation (GDPR).[10]

From privacy is dead to privacy is the future. The point here is that not only technical developments are moving fast, but also that social standards and customer expectations are evolving.[11]

To begin to understand how our social norms should be translated to the new digital reality, we will need to take the time to understand the underlying rationales of the existing rules and translate them to the new reality. Our main point here is that that the two concepts of consumer control and wealth distribution are separate but intertwined. They seek to empower consumers to take control of their data, but they also treat privacy protection as a right that can be traded or sold. These purposes are equally worthy, but cannot be combined. They need to be regulated separately and in a different manner. Adopting a commercial trade approach to privacy protection will ultimately undermine rather than protect consumer privacy. To complicate matters further, experience with the consent-based model for privacy protection in other countries (and especially under the GDPR) shows that the consent-based model is flawed and fails to achieve privacy protection in the first place. We will first discuss why consent is not the panacea to achieve privacy protection.

 

Why Should We Be Skeptical of Consent as a Solution for Consumer Privacy?

On the surface, consent may appear to be the best option for privacy protection because it allows consumers to choose how they will allow companies to use their personal information. Consent tended to be the default approach under the EU’s Data Protection Directive, and the GDPR still lists consent first among the potential grounds for processing of personal data.[12] Over time, however, confidence in consent as a tool for privacy protection has waned.

Before GDPR, many believed that the lack of material privacy compliance was mostly due to lack of enforcement under the Directive, and that all would be well when the European supervisory authorities would have higher fining and broader enforcement powers. However, now these powers are granted under GDPR, not much has changed and privacy violations are still being featured in newspaper headlines.

By now the realization is setting in that non-compliance with privacy laws may also be created by a fundamental flaw in consent-based data protection. The laws are based on the assumption that as long as people are informed about which data are collected, by whom and for which purposes, they can then make an informed decision. The laws seek to ensure people’s autonomy by providing choices. In a world driven by AI, however, we can no longer fully understand what is happening to our data. The underlying logic of data-processing operations and the purposes for which they are used have now become so complex that they can only be described by means of intricate privacy policies that are simply not comprehensible to the average citizen. It is an illusion to suppose that by better informing individuals about which data are processed and for which purposes, we can enable them to make more rational choices and to better exercise their rights. In a world of too many choices, autonomy of the individual is reduced rather than increased. We cannot phrase it better than Cass Sunstein in his book, The Ethics of Influence(2016):

[A]utonomy does not require choices everywhere; it does not justify an insistence on active choosing in all contexts. (…) People should be allowed to devote their attention to the questions that, in their view, deserve attention. If people have to make choices everywhere, their autonomy is reduced, if only because they cannot focus on those activities that seem to them most worthy of their time.[13]

More fundamental is the point that a regulatory system that relies on the concept of free choice to protect people against consequences of AI is undermined by the very technology this system aims to protect us against. If AI knows us better than we do ourselves, it can manipulate us, and strengthening the information and consent requirements will not help.

Yuval Harari explains it well:

What then, will happen once we realize that customers and voters never make free choices, and once we have the technology to calculate, design or outsmart their feelings? If the whole universe is pegged to the human experience, what will happen once the human experience becomes just another designable product, no different in essence from any other item in the supermarket?[14]

The reality is that organizations find inscrutable ways of meeting information and consent requirements that discourage individuals from specifying their true preferences and often make them feel forced to click “OK” to obtain access to services.[15] The commercial interests of collecting as many data as possible are so large that in practice all tricks available are often used to entice website visitors and app users to opt in (or to make it difficult for them to opt out). The design thereby exploits the predictably irrational behavior of people so that they make choices that are not necessarily in their best interests.[16] A very simple example is that consumers are more likely to click on a blue button than a gray button, even if the blue one is the least favorable option. Telling is that Google once tested 41 shades of blue to measure user response.[17] Also established companies deliberately make it difficult for consumers to make their actual choice and seem to have little awareness of doing something wrong. In comparison, if you would deliberately mislead someone in the offline world, everyone would immediately feel that this was unacceptable behavior.[18] Part of the explanation for this is that the digital newcomers have deliberately and systematically pushed the limits of their digital services in order to get their users accustomed to certain processing practices.[19] Although many of these privacy practices are now under investigation by privacy and antitrust authorities around the world,[20] we still see that these practices have obscured the view of what is or is not an ethical use of data.

Consent-based data protection laws have resulted in what is coined as mechanical proceduralism,[21] whereby organizations go through the mechanics of notice and consent, without any reflection on whether the relevant use of data is legitimate in the first place. In other words, the current preoccupation is with what is legal, which is distracting us from asking what is legitimate to do with data. We see this reflected in even the EU’s highest court having to decide whether a pre-ticked box constitutes consent (surprise: it does not) and the EDPB feeling compelled to update its earlier guidance by spelling out whether cookie walls constitute “freely given” consent (surprise: they do not).[22]

Privacy legislation needs to regain its role of determining what is and is not permissible. Instead of a legal system based on consent, we need to re-think the social contract for our digital society, by having the difficult discussion about where the red lines for data use should be rather than passing the responsibility for a fair digital society to individuals to make choices that they cannot oversee.[23]

 

The U.S. System: Notice and Choice (as Opposed to Notice and Consent)

In the United States, companies routinely require consumers to consent to the processing of their data, such as by clicking a box stating that they agree to the company’s privacy policy, although there is generally no consent requirement under U.S. law.[24] This may reflect an attempt to hedge the risk of consumers challenging the privacy terms as an ‘unfair trade practice’.[25] The argument being that the consumer made an informed decision to accept the privacy terms as part of the transaction, and that the consumer was free to reject the company’s offering and choose another. In reality, of course, consumers will have little actual choice, particularly where the competing options are limited and offer similar privacy terms. In economic terms, we have an imperfect market where companies do not compete based on privacy given their aligned interest to acquire as much personal information of consumers as possible.[26] This leads to a race to the bottom in terms of privacy protection.[27]

An interesting parallel here is that the EDPB recently rejected the argument that consumers would have freedom of choice in these cases:[28]

The EDPB considers that consent cannot be considered as freely given if a controller argues that a choice exists between its service that includes consenting to the use of personal data for additional purposes on the one hand, and an equivalent service offered by a different controller on the other hand. In such a case, the freedom of choice would be made dependent on what other market players do and whether an individual data subject would find the other controller’s services genuinely equivalent. It would furthermore imply an obligation for controllers to monitor market developments to ensure the continued validity of consent for their data processing activities, as a competitor may alter its service at a later stage. Hence, using this argument means a consent relying on an alternative option offered by a third party fails to comply with the GDPR, meaning that a service provider cannot prevent data subjects from accessing a service on the basis that they do not consent.

By now, U.S. privacy advocates also urge the public and private sectors to move away from consent as a privacy tool. For example, Lanier and Weyl argued that privacy concepts of consent “aren’t meaningful when the uses of data have become highly technical, obscure, unpredictable, and psychologically manipulative.”[29] In a similar vein, Burt argued that consent cannot be expected to play a meaningful role, “[b]ecause the threat of unintended inferences reduces our ability to understand the value of our data, our expectations about our privacy—and therefore what we can meaningfully consent to—are becoming less consequential.”[30]

Moving away from consent / choice-based privacy models is only part of the equation, however. In many cases, commentators have even greater concerns about the economic ramifications of large-scale data processing and whether consumers will share in the wealth generated by their data.

 

Disentangling Economic Objectives from Privacy Objectives

Other than a privacy concept, consent can also be an economic tool: a means of giving consumers leverage to gain value from companies for the use of their data. The privacy objectives and economic objectives may be complementary, even to the point that it may not be easy to distinguish between them. We need to untangle these objectives, however, because they may yield different results.

Where the goal is predominantly economic in nature, the conversation tends to shift away from privacy to economic inequality and fair compensation. We will discuss the relevant proposals in more detail below, but note that all proposals require that we put a ‘price tag’ on personal information.

“It is obscene to suppose that this [privacy] harm can be reduced to the obvious fact that users receive no fee for the raw material they supply. That critique is a feat of misdirection that would use a pricing mechanism to institutionalize and therefore legitimate the extraction of human behavior for manufacturing and sale.” – Zuboff, p. 94.

  1. No Established Valuation Method

Despite personal information being already bought and sold among companies, such as data brokers, there is not yet an established method of calculating the value of personal information.[31] Setting one method will likely prove impossible under all circumstances. For example, the value of such data to a company will depend on the relevant use, which may well differ per company. The value of data elements often also differs depending on the combination of data elements available, whereby data analytics of mundane data may lead to valuable inferences that can be sensitive for the consumer. How much value should be placed on the individual data elements, as compared with the insights the company may create by combining these data elements or even by combining these across all customers?[32]

The value of data to a company may further have little correlation with the privacy risks to the consumer. The cost to consumers may depend not only on the sensitivity of use of their data but also on the potential impact if their data are lost. For example, information about a consumer’s personal proclivities may be worth only a limited dollar amount to a company, but the consumer may have been unwilling to sell that data to the company for that amount (or, potentially, for any amount). When information is lost, the personal harm or embarrassment to the individual may be much greater than the value to the company. The impact of consumers’ data being lost will also often depend on the combination of data elements. For instance, an email address is not in itself sensitive data, but in combination with a password, it becomes highly sensitive as people often use the same email/password combination to access different websites.

Different Approaches to Valuation

One approach might be to leave it to the consumer and company to negotiate the value of the consumer’s data to that company, but this would be susceptible to all of the problems discussed above, such as information asymmetries and unequal bargaining power. It may also make privacy a luxury good for the affluent, who would feel less economic pressure to sell their personal information, thus resulting in less privacy protection for consumers who are less economically secure.[33]

Another approach is suggested by Lanier and Weyl and would require companies to pay consumers for using their data, with the payment terms negotiated by the equivalent of new entities similar to labor unions that would engage in collective bargaining with companies over data rights.[34] However, this proposal also would require consumers to start paying companies for services that today are provided free of charge in exchange for the consumer’s data, such as email, social media, and cloud-based services. Thus, a consumer may end up ahead or behind financially, depending on the cost of the services that the consumer chooses to use and the negotiated value of the consumer’s data.

A third approach may involve the “data dividend” concept proposed by Governor Newsom. As the concept hasn’t yet been clearly defined, some commentators suggest that this proposal involves individualized payments directly to consumers, while others suggest that payments are to be made into a government fund from which fixed payments would be made to consumers, similar to the Alaska pipeline fund that sought to distribute some of the wealth generated from Alaska’s oil resources to its residents. Given that data has been called the “new oil,” the idea of a data dividend modeled on the Alaska pipeline payments may seem apt, although the analogy quickly breaks down due to the greater difficulty of calculating the value of data.[35] Moreover, commentators have rightly noted that the data dividend paid to an individual is likely to be mere “peanuts,” given the vast numbers of consumers whose information is being used.[36]

Whatever valuation and payment model – if any – might be adopted, it risks devaluing privacy protection. The data dividend concept, as well as the CCPA’s approach to financial incentives, each suggest that the value of a consumer’s personal information is measured by its value to the company.[37] As indicated before, this value may have little correlation with the privacy risks to the consumer.  Though it is commendable that these proposals seek to provide some measure of compensation to consumers, it is important to avoid conflating economic and privacy considerations, and avoid a situation where consumers will be trading away their data or privacy rights.[38] Although societies certainly may decide to require some degree of compensation to consumers as a wealth redistribution measure, it will be important to present this as an economic tool and not as a privacy measure.

 

Closing Thoughts

As the late Giovanni Buttarelli in his final vision statement forewarned, “Notions of ‘data ownership’ and legitimization of a market for data risks a further commoditization of the self and atomization of society…. The right to human dignity demands limits to the degree to which an individual can be scanned, monitored and monetized—irrespective of any claims to putative ‘consent.’”[39]

There are many reasons why societies may seek to distribute a portion of the wealth generated from personal information to the consumers who are the source and subject of this personal information. This does not lessen the need for privacy laws to protect this personal information, however. By distinguishing clearly between economic objectives and privacy objectives, and moving away from consent-based models that fall short of both objectives, we can best protect consumers and their data, while still enabling companies to unlock the benefits of AI and machine learning for industry, society, and consumers.

[1]Lokke Moerel is a Professor of Global ICT Law at Tilburg University and Senior of Counsel at Morrison & Foerster in Berlin. Christine Lyon is partner at Morrison & Foerster in Palo Alto, California.

[2]E. Brynjolfsson & A. McAfee, Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant New Technologies, London: W.W. Norton & Company 2014, which gives a good overview of the friction and disruption that arose from the industrial revolution and how society ultimately responded and regulated negative excesses and a description of the friction and disruption caused by the digital revolution. A less accessible, but very instructive, book, on the risks of digitization and big tech for society is S. Zuboff, The Age of Surveillance Capitalism, New York: Public Affairs 2019 (hereinafter, “Zuboff 2019”).

[3]An exploration of these new issues, as well as proposals on how to regulate the new reality from a data protection perspective, can be found in L. Moerel, Big Data Protection: How to Make the Draft EU Regulation on Data Protection Future Proof(oration Tilburg), Tilburg: Tilburg University 2014 (hereinafter, “Moerel 2014”), pp. 9-13, and L. Moerel & C. Prins, Privacy for the Homo Digitalis: Proposal for a New Regulatory Framework for Data Protection in the Light of Big Data and the Internet of Things(2016), [ssrn.com/abstract=2784123] (hereinafter, “Moerel & Prins 2016”). On ethical design issues, see J. Van den Hoven, S. Miller & T. Pegge (eds.), Designing in Ethics, Cambridge: CUP 2017 (hereinafter, “Van den Hoven, Miller & Pegge 2017”), p. 5.

[4]L. Vaas, “FTC renews call for single federal privacy law,” Naked Security by Sophos(May 10, 2019), https://nakedsecurity.sophos.com/2019/05/10/ftc-renews-call-for-single-federal-privacy-law/.

[5]Jaron Lanier and E. Glen Weyl, “A Blueprint for a Better Digital Society,” Harvard Business Review(Sept. 26, 2018), https://hbr.org/2018/09/a-blueprint-for-a-better-digital-society.

[6]Zuboff 2019, p. 94, refers to this by a now commonly cited adage, but nuances it by indicating consumers are not the product, but rather “[the objects from which raw materials are extracted and expropriated for Google’s prediction factories. Predictions about our behavior are Google’s products, and they are sold to its actual customers but not to us.”

[7]Angel Au-Yeung, “California Wants to Copy Alaska and Pay People a ‘Data Dividend.’ Is It Realistic?” Forbes(Feb. 14, 2019), https://www.forbes.com/sites/angelauyeung/2019/02/14/california-wants-to-copy-alaska-and-pay-people-a-data-dividend–is-it-realistic/#30486ee6222c.

[8]Cal. Civ. Code § 1798.125(b)(1) (“A business may offer financial incentives, including payments to consumers as compensation for the collection of personal information, the sale of personal information, or the deletion of personal information. A business may also offer a different price, rate, level, or quality of goods or services to the consumer if that price or difference is directly related to the value provided to the business by the consumer’s data”). The California Attorney General’s final proposed CCPA regulations, issued on June 1, 2020 (Final Proposed CCPA Regulations), expand on this obligation by providing that a business must be able to show that the financial incentive or price or service difference is reasonably related to the value of the consumer’s data. (Final Proposed CCPA Regulations at 20 CCR § 999.307(b).)  The draft regulations also require the business to use and document a reasonable and good faith method for calculating the value of the consumer’s data. Id. 

[9]Moerel 2014, p. 21.

[10]Isobel Asher Hamilton, “Microsoft CEO Satya Nadella made a global call for countries to come together to create new GDPR-style data privacy laws,” Business Insider(Jan. 24, 2019), available at https://www.businessinsider.com/satya-nadella-on-gdpr-2019-1.

[11]L. Moerel, Reflections on the Impact of the Digital Revolution on Corporate Governance of Listed Companies,first published in Dutch by Uitgeverij Paris in 2019, and written in assignment of the Dutch Corporate Law Association for their annual conference, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3519872, at para. 4.

[12]GDPR Art. 6(1): “Processing shall be lawful only if and to the extent that at least one of the following applies:

(a) the data subject has given consent to the processing of his or her personal data for one or more specific purposes….”

[13]Cass Sunstein, The Ethics of Influence, Cambridge University Press 2016 (hereinafter: Sunstein 2016), p. 65. .

[14]Yuval Noah Harari, Homo Deus: A History of Tomorrow, Harper 2017(hereinafter: Harari 2017), p. 277.

[15]This is separate from the issue that arises where companies require a consumer to provide consent for use of their data for commercial purposes, as a condition of receiving goods or services (so-called tracking walls and cookies walls). It also may arise if a consumer is required to provide a bundled consent that covers multiple data processing activities, without the ability to choose whether to consent to a particular data processing activity within that bundle. In May 2020, the European Data Protection Board (EDPB) updated its guidance on requirements of consent under the GDPR, now specially stating that consent is not considered freely given in the case of cookie walls, see EDPB Guidelines 05/2020 on consent under Regulation 2016/679 Version 1.0, Adopted on May 4, 2020, available at https://edpb.europa.eu/sites/edpb/files/files/file1/edpb_guidelines_202005_consent_en.pdf(EDPB Guidelines on Consent 2020).

[16]This is the field of behavioral economics. See D. Ariely, Predictably Irrational, London: HarperCollinsPublishers 2009 (hereinafter, “Ariely 2009”), at Introduction. For a description of techniques reportedly used by large tech companies, see the report from the Norwegian Consumer Council, Deceived by Design: How tech companies use dark patterns to discourage our privacy rights(June 27, 2018), available at https://fil.forbrukerradet.no/wp-content/uploads/2018/06/2018-06-27-deceived-by-design-final.pdf(hereinafter, “Norwegian Consumer Council 2018”). The Dutch Authority for Consumers & Market (ACM) has announced that the abuse of this kind of predictable irrational consumer behavior must cease and that companies have a duty of care to design the choice architecturein a way that is fair and good for the consumer. Authority for Consumers & Market, Taking advantage of predictable consumer behavior online should stop(Sept. 2018), available at https://fil.forbrukerradet.no/wp-content/uploads/2018/06/2018-06-27-deceived-by-design-final.pdf.

[17]See Norwegian Consumer Council 2018, p. 19, reference L.M. Holson, “Putting a Bolder Face on Google,” New York Times(Feb. 28, 2009), www.nytimes.com/2009/03/01/business/01marissa.html.

[18]See Van den Hoven, Miller & Pegge 2017, p. 25, where the ethical dimension of misleading choice architecture is well illustrated by giving an example in which someone with Alzheimer’s is deliberately confused by rearranging his or her system of reminders. For an explanation of a similar phenomenon, see Ariely 2009, Introduction and Chapter 14: “Why Dealing with Cash Makes Us More Honest,” where it is demonstrated that most unfair practices are one step removed from stealing cash. Apparently, it feels less bad to mess around in accounting than to steal real money from someone.

[19]Zuboff 2019 convincingly describes that some apparent failures of judgmentthat technology companies’ management regard as misstepsand bugs(for examples, see p. 159), are actually deliberate, systematic actions intended to habituate their users to certain practices in order to eventually adapt social norms. For what Zuboff 2019 calls the Disposition Cycle, see pp. 138-166.

[20]Zuboff 2019 deals extensively with the fascinating question of how it is possible that technology companies got away with these practices for so long. See pp. 100-101.

[21]Moerel & Prins 2016, para. 3.

[22]EDPB Guidelines on Consent, p. 10.

[23]Lokke Moerel, IAPP The GDPR at Two: Expert Perspectives, “EU data protection laws are flawed — they undermine the very autonomy of the individuals they set out to protect”, 26 May 2020, https://iapp.org/resources/article/gdpr-at-two-expert-perspectives/.

[24]U.S. privacy laws require consent only in limited circumstances (e.g., the Children’s Online Privacy Protection Act, Fair Credit Reporting Act, and Health Insurance Portability and Accountability Act), and those laws typically would require a more specific form of consent in any event.

[26]See the following for discussion of why, from an economic perspective, information asymmetries and transaction cost lead to market failure which require legal intervention, Frederik. J. Zuiderveen Borgesius, “Consent to Behavioural Targeting in European Law – What Are the Policy Implications of Insights From Behavioural Economics,” Amsterdam Law School Legal Studies Research Paper No. 2013-43, Institute for Information Law Research Paper No. 2013-02 -(hereinafter: Borgesius 2013), pp. 28 and 37, SSRN-id2300969.pdf.

[28]EDPB Guidelines on Consent, p. 10.

[29]Lanier and Weyl, “A Blueprint for a Better Digital Society,” Harvard Business Review(Sept. 26, 2018).

[30]Andrew Burt, “Privacy and Cybersecurity Are Converging. Here’s Why That Matters for People and for Companies,” Harvard Business Review(Jan. 3, 2019), https://hbr.org/2019/01/privacy-and-cybersecurity-are-converging-heres-why-that-matters-for-people-and-for-companies.

[31]See, e.g., Adam Thimmsech, “Transacting in Data: Tax, Privacy, and the New Economy,” 94 Denv. L. Rev. 146 (2016) (hereinafter, “Thimmsech”), pp. 174-177 (identifying a number of obstacles to placing a valuation on personal information and noting that “[u]nless and until a market price develops for personal data or for the digital products that are the tools of data collection, it may be impossible to set their value”). See also Dante Disparte and Daniel Wagner, “Do You Know What Your Company’s Data Is Worth?” Harvard Business Review(Sept. 16, 2016) (explaining the importance of being able to accurately quantify the enterprise value of data (EvD) but observing that “[d]efinitions for what constitutes EvD, and methodologies to calculate its value, remain in their infancy”).

[32]Thimmsech at 176: “To start, each individual datum is largely worthless to an aggregator. It is the network effects that result in significant gains to the aggregator when enough data are collected. Further complicating matters is the fact that the ultimate value of personal data to an aggregator includes the value generated by that aggregator through the use of its algorithms or other data-management tools. The monetized value of those data is not the value of the raw data, and isolating the value of the raw data may be impossible.”

[33]Moerel & Prins 2016, para. 2.3.2. See also Morozov, Evengy (2013), “To Save Everything Click Here. The Folly of Technological Solutionism,” Public Affairs, whowarns that for pay-as-you-liveinsurance for some people the choice will not be a fully free one, since those on a limited budget may not be able to afford privacy-friendly insurance. After all, it is bound to be more expensive.

[34]Lanier and Weyl, “A Blueprint for a Better Digital Society,” Harvard Business Review(Sept. 26, 2018) (“For data dignity to work, we need an additional layer of organizations of intermediate size to bridge the gap. We call these organizations ‘mediators of individual data,’ or MIDs. A MID is a group of volunteers with its own rules that represents its members in a wide range of ways. It will negotiate data royalties or wages, to bring the power of collective bargaining to the people who are the sources of valuable data….”). Lanier extends this theory more explicitly to personal information in his New York Times video essay at https://www.nytimes.com/interactive/2019/09/23/opinion/data-privacy-jaron-lanier.html. See also Imanol Arrieta Ibarra, Leonard Goff, Eigo Jiminez Hernandez, Jaron Lanier, and E. Glen Weyl, “Should We Treat Data as Labor?: Moving Beyond “Free,” American Economic Association Papers & Proceedings, Vol. 1, No. 1 (May 2018) at https://www.aeaweb.org/articles?id=10.1257/pandp.20181003, at p. 4 (suggesting that data unions could also exert power through the equivalent of labor strikes: “[D]ata laborers could organize a “data labor union” that would collectively bargain with [large technology companies]. While no individual user has much bargaining power, a union that filters platform access to user data could credibly call a powerful strike. Such a union could be an access gateway, making a strike easy to enforce and on a social network, where users would be pressured by friends not to break a strike, this might be particularly effective.”).

[35]See, e.g.,Marco della Cava, “Calif. tech law would compensate for data,” USA Today(Mar. 11, 2019) (“[U]nlike the Alaska Permanent Fund, which in the ’80s started doling out $1,000-and-up checks to residents who were sharing in the state’s easily tallied oil wealth, a California data dividend would have to apply a concrete value to largely intangible and often anonymized digital information. There also is concern that such a dividend would establish a pay-for-privacy construct that would be biased against the poor, or spawn a tech-tax to cover the dividend that might push some tech companies out of the state.”).

[36]Steven Hill, “Opinion: Newsom’s California Data Dividend Idea is a Dead End,” East Bay Times (Mar. 7, 2019) (“While Newsom has yet to release details…the money each individual would receive amounts to peanuts. Each of Twitter’s 321 million users would receive about $2.83 [if the company proportionally distributed its revenue to users]; a Reddit user about 30 cents. And paying those amounts to users would leave these companies with zero revenue or profits. So in reality, users would receive far less. Online discount coupons for McDonald’s would be more lucrative.”).

[37]Cal. Civ. Code § 1798.125(a)(2) (“Nothing in this subdivision prohibits a business from charging a consumer a different price or rate, or from providing a different level or quality of goods or services to the consumer, if that difference is reasonably related to the value provided to the business by the consumer’s data.”). The CCPA originally provided that the difference must be “directly related to the value provided to the consumerby the consumer’s data,” but it was later amended to require the difference to be “directly related to the value provided to the business by the consumer’s data.” (Emphases added.) The CCPA does not prescribe how a business should make this calculation. The Final Proposed CCPA Regulations would require businesses to use one or more of the following calculation methods, or “any other practical and reliable method of calculation used in good-faith” (Final Proposed CCPA Regulations, 20 CCR § 999.307(b)):

[38]See in a similar vein the German Data Ethics Commission, Standards for the Use of Personal Data, Standard 6:6, where it argues that data should not be referred to as a “counter-performance” provided in exchange for a service, even though the term sums up the issue in a nutshell and has helped to raise awareness among the general public. https://www.bmjv.de/SharedDocs/Downloads/DE/Themen/Fokusthemen/Gutachten_DEK_EN.pdf?__blob=publicationFile&v=2.

[39]International Association of Privacy Professionals, Privacy 2030 for Europe: A New Vision for Europe at p. 19, https://iapp.org/media/pdf/resource_center/giovanni_manifesto.pdf.