FPF Issues Award for Research Data Stewardship to Stanford Medicine & Empatica, Google & Its Academic Partners
WASHINGTON, DC(June 29, 2021) – The second-annual FPF Award for Research Data Stewardship honors two teams of researchers and corporate partners for their commitment to privacy and ethical uses of data in their efforts to research aspects of the COVID-19 pandemic. One team is a collaboration between Stanford Medicine researchers led by Tejaswini Mishra, PhD, Professor Michael Snyder, PhD, and medical wearable and digital biomarker company Empatica. The other team is a collaboration between Google’s COVID-19 Mobility Reports and COVID-19 Aggregated Mobility Research Dataset projects, and researchers from multiple universities in the United States and around the globe.
“Researchers rely on data to find solutions to the challenges facing our society, but data must only be shared and used in a way that protects individual rights,” said Jules Polonetsky, CEO of the Future of Privacy Forum. “The teams at Stanford Medicine, Empatica, and Google employed a variety of techniques in their research to ensure data was used ethically – including developing strong criteria for potential partners, aggregating and anonymizing participant data, discarding data sets at risk of being re-identified, and conducting extensive ethics and privacy reviews.”
The FPF Award for Research Data Stewardship was established with the support of the Alfred P. Sloan Foundation, a not-for-profit grantmaking institution that supports high-quality, impartial scientific research and institutions.
The FPF Award for Research Data Stewardship recognizes excellence in the privacy-protective stewardship of corporate data that is shared with academic researchers. The award highlights companies and academics who demonstrate novel best practices and approaches to sharing corporate data in order to advance scientific knowledge.
Stanford Medicine and Empatica Partnership
Smartwatches and other wearable devices that continuously measure biometric data can provide “digital vital signs” for the user. Dr. Mishra’s team, consisting of Dr. Michael Snyder, Erika Mahealani Hunting, Alessandra Celli, Arshdeep Chauhan, and Jessi Wanyi Li from the Stanford University School of Medicine’s Department of Genetics, received anonymized data from Empatica’s E4 wristband, including data on participants’ skin temperature, heart rate, and electrodermal activity. This data was collected by the Stanford Medicine team to study whether it could be used to detect COVID-19 infections prior to the onset of symptoms. To ensure that this data sharing project minimized potential privacy risks, both Empatica and the Stanford Medicine team took a number of steps, including:
Establishing limits on the sharing and use of personal health information.
Using a researcher-friendly device, Empatica’s E4, that prevents the collection of geolocation data, IP address, or mobile International Mobile Equipment Identity (IMEI) identifiers.
Using QR codes to link participants to specific wearable devices to ensure that participant names and study record IDs would not be shared.
“A large part of our job is to embed research and its results into products that will improve people’s lives.” Said Matteo Lai, CEO of Empatica, “Patients are always at the center of this endeavor, and so naturally are their needs: privacy, a great experience, a sense of safety, high quality are all part of our responsibility. We are honored that this approach and care is recognized as something to strive for.”
The research project is ongoing.
Google’s Community Mobility Information
Google has also been recognized with the second-annual FPF Award for Research Data Stewardship for its work to produce, aggregate, anonymize, and share data on community movement during the COVID-19 pandemic. In response to requests from public health officials, Google created Community Mobility Reports (CMRs) to provide aggregated, anonymized insights into how community mobility has changed in response to policies aimed at combating COVID-19. To ensure that personal data, including an individual’s location, movement, or contacts, cannot be derived from the metrics, the data included in Google’s CMRs goes through a robust anonymization process that employs differential privacy techniques while providing researchers and public health authorities with valuable insights to help inform official decision-making.
Google is also being recognized for a related project, the Aggregated Mobility Research Dataset. In addition to the COVID-19 CMR data, which were made publicly available online, this dataset was shared with specific qualified researchers for the sole purpose of studying the effects of COVID-19. The research was shared with qualified individual researchers (those with proven track records in studying epidemiology, public health, or infectious disease) that accepted the data under contractual commitments to use the data ethically while maintaining privacy. Google was also able to share more detailed mobility data with these researchers, while keeping strong mathematical privacy protections in place. Examples of research that utilized Google’s Aggregated Mobility Research Dataset include:
Hierarchical organization of urban mobility and its connection with city livability
Examining COVID-19 forecasting using spatio-temporal graph neural networks
“As the COVID-19 crisis emerged, Google moved to support public health officials and researchers with resources to help manage the spread,” said Dr. Karen DeSalvo, Chief Health Officer, Google Health. “We heard from the public health community that mobility data could help provide them an understanding of whether people were social distancing to interrupt the spread. Given the sensitivity of mobility data, we needed to deliver this information in a privacy preserving way, and we’re honored to be recognized by FPF for our approach.”
Google ensured the protection of this shared data in both projects by:
Anonymizing the Mobility Reports through differential privacy, which intentionally adds random noise to metrics in a manner that maintains both users’ privacy and the accuracy of the data.
Organizing information by trips taken to different types of locations, rather than providing data about granular geographic areas to protect community privacy
Requiring that Google review all publications using the Aggregated Mobility Research Dataset to ensure the researchers describe the dataset and its limitations correctly
Developing strict privacy protocols agreements and partner criteria for the Aggregated Mobility Research Dataset.
Google’s privacy-driven approach was illustrated by the company’s direct collaboration with Prof. Gregory Wellenius, Boston University School of Public Health’s Department of Environmental Health, Dr. Thomas Tsai, Brigham and Women’s Hospital Department of Surgery and Harvard T.H. Chan School of Public Health’s Department of Health Policy and Management, and Dr. Ashish Jha, Dean of Brown University’s School of Public Health. The researchers evaluated the impacts of specific state-level policies on mobility and subsequent COVID-19 case trajectories using anonymized and aggregated mobility data from Google users who had opted-in to share their data for research. The shared data resulted in an academic paper which will be published in Nature Communications. The project found that state-level emergency declarations resulted in a 9.9% reduction in time spent away from places of residence, with the implementation of social distancing policies resulting in an additional 24.5% reduction in mobility the following week, and shelter-in-place mandates yielding a further 29% reduction in mobility. Notably, these decreases in mobility were associated with significant reductions in reported COVID-19 cases two to four weeks later.
FPF Partners with Penn State and University of Michigan Researchers on Searchable Database of Privacy-Related Documents
FPF is collaborating with a team of researchers to build a searchable database of privacy policies and other privacy-related documents. The PrivaSeer project, led by researchers from Penn State and the University of Michigan, has received a $1.2 million grant from the National Science Foundation (NSF) to ease the process of collecting and utilizing privacy documents and privacy-related data.
FPF Director for Global Privacy Dr. Gabriela Zanfir-Fortuna will serve as a co-principal investigator, a first for FPF on an NSF-funded project. Dr. Zanfir-Fortuna’s expertise includes work on global privacy developments and European data protection law and policy, with a focus on de-identification, AI, mobility, ad tech, and education.
The other co-principal investigators in this project include: Dr. Shomir Wilson and Dr. Lee Giles from Penn State University, and Dr. Florian Schaub from the University of Michigan.
PrivaSeer will function as a searchable database that allows researchers to perform a host of tasks like collecting, exploring, and evaluating privacy documents such as privacy policies, terms of service agreements, cookie policies, privacy bills and laws, and regulatory guidelines.
The search engine employs Natural Language Processing (NLP), a type of artificial intelligence that is able to process large quantities of data through a combination of linguistics, AI programming, and computer science.
“One of the reasons to have a privacy policy search engine is that you can get an idea about how different companies treat their user privacy currently and over time. This can also inform users on how they may want to react to those companies.”
C. Lee Giles, the David Reese Professor of Information Sciences and Technology, Penn State & PrivaSeer Project co-principal investigator
The search engine will provide researchers insights on privacy policy trends and enable researchers to easily and securely access relevant privacy documentation online. Previous research on privacy policies has encountered issues surrounding access to suitable privacy data. PrivaSeer will help researchers navigate this problem and allow for large-scale interpretation of privacy data.
View Penn State University’s Institute for Computational and Data Sciences’ statement here.
Manipulative Design: Defining Areas of Focus for Consumer Privacy
In consumer privacy, the phrase “dark patterns” is everywhere. Emerging from a wide range of technical and academic literature, it now appears in at least two US privacy laws: the California Privacy Rights Act and the Colorado Privacy Act (which, if signed by the Governor, will come into effect in 2025).
Under both laws, companies will be prohibited from using “dark patterns,” or “user interface[s] designed or manipulated with the substantial effect of subverting or impairing user autonomy, decision‐making, or choice,” to obtain user consent in certain situations–for example, for the collection of sensitive data.
When organizations give individuals choices, some forms of manipulation have long been barred by consumer protection laws, with the Federal Trade Commission and state Attorneys General prohibiting companies from deceiving or coercing consumers into taking actions they did not intend or striking bargains they did not want. But consumer protection law does not typically prohibit organizations from persuading consumers to make a particular choice. And it is often unclear where the lines fall between cajoling, persuading, pressuring, nagging, annoying, or bullying consumers. The California and Colorado laws seek to do more than merely bar deceptive practices; they prohibit design that “subverts or impairs user autonomy.”
What does it mean to subvert user autonomy, if a design does not already run afoul of traditional consumer protections law? Just as in the physical world, the design of digital platforms and services always influences behavior — what to pay attention to, what to read and in what order, how much time to spend, what to buy, and so on. To paraphrase Harry Brignull (credited with coining the term), not everything “annoying” can be a dark pattern. Some examples of dark patterns are both clear and harmful, such as a design that tricks users into making recurring payments, or a service that offers a “free trial” and then makes it difficult or impossible to cancel. In other cases, the presence of “nudging” may be clear, but harms may be less clear, such as in beta-testing what color shades are most effective at encouraging sales. Still others fall in a legal grey area: for example, is it ever appropriate for a company to repeatedly “nag” users to make a choice that benefits the company, with little or no accompanying benefit to the user?
In Fall 2021, Future of Privacy Forum will host a series of workshops with technical, academic, and legal experts to help define clear areas of focus for consumer privacy, and guidance for policymakers and legislators. These workshops will feature experts on manipulative design in at least three contexts of consumer privacy: (1) Youth & Education; (2) Online Advertising and US Law; and (3) GDPR and European Law.
As lawmakers address this issue, we identify at least four distinct areas of concern:
Designs that cause concrete physical or financial harms to individuals. In some cases, design choices are implicated in concrete physical or financial harms. This might include, for example, a design that tricks users into making recurring payments, or makes unsubscribing from a free trial or other paid service difficult or impossible, leading to unwanted charges.
Designs that impact individual autonomy or dignity (but do not necessarily cause concrete physical or financial harm). In many cases, we observe concerns over autonomy and dignity, even where the use of data would not necessarily cause harm. For the same reasons that there is wide agreement that so-called subliminal messaging in advertising is wrong (as well as illegal), there is a growing awareness that disrespect for user autonomy in consumer privacy is objectionable on its face. As a result, in cases where the law requires consent, such as in the European Union for placement of information onto a user’s device, the law ought to provide a remedy for individuals who have been subject to a violation of that consent.
Designs that persuade, nag, or strongly push users towards a particular outcome, even where it may be possible for users to decline. In many cases, the design of a digital platform or serviceclearlypushes users towards a particular outcome, even if it is possible (if burdensome) for users to make a different choice. In such cases, we observe a wide spectrum of tactics that may be evaluated differently depending on the viewer and the context. Repeated requests may be considered “nagging” or “persuasion”; one person’s “clever marketing,” taken too far, becomes another person’s “guilt-shaming” or “confirm-shaming.” Ultimately, our preference for defaults (“opt in” versus “opt out”), and within those defaults, our level of tolerance for “nudging,” may be driven by the social benefits or values attached to the choice itself.
Designs that exploit biases, vulnerabilities, or heuristics in ways that implicate broader societal harms or values. Finally, we observe that the collection and use of personal information does not always solely impact individual decision-making. Often, the design of online platforms can influence groups in ways that impact societal values, such as the values of privacy, avoidance of “tech addiction,” free speech, the availability of data from or about marginalized groups, or the proliferation of unfair price discrimination or other market manipulation. Understanding how design choices may influence society, even if individuals are minimally impacted, may require examining the issues differently.
This week at the first edition of the annual Dublin Privacy Symposium, FPF will join other experts to discuss principles for transparency and trust. The design of user interfaces for digital products and services pervades modern life and directly impacts the choices people make with respect to sharing their personal information.
ITPI Event Recap – The EU Data Strategy and the Draft Data Governance Act
On May 19, 2021, the Israel Tech Policy Institute (ITPI), an Affiliate of The Future of Privacy Forum (FPF), hosted, together with the Tel Aviv University, The Stewart & Judy Colton Law and Innovation Program, an online event on the European Union’s (EU) Data Strategy and the Draft Data Governance Act (DGA).
The draft DGA is one of the proposed legislative measures for implementing the European Commission’s 2020 European Strategy for Data (EU Data Strategy), whose declared goal is to give the EU a “competitive advantage” by enabling it to capitalise on its vast quantity of public and private sector-controlled data. The DGA will establish a framework for using more data held by the public sector, regulating and increasing trust in data intermediaries and similar service providers that provide data sharing services, and data altruism (i.e., data voluntarily made available by individuals or companies for the “general interest”).
While speakers also addressed other proposals tabled by the European Commission (EC) for regulating players in the data economy – such as the Digital Services Act (DSA) and the Digital Markets Act (DMA) -, most of the discussion revolved around the DGA’s expected impact and underlying policy drivers.
Both Prof. Assaf Hamdani, from the Tel Aviv University’s Law Faculty, and Limor ShmerlingMagazanik, Managing Director at ITPI, assumed moderating roles during the event, which counted on the input of speakers from the EC, the Massachusetts Institute of Technology (MIT), Covington & Burling LLP, Mastercard and the Israeli Government’s ICT Authority.
Maria Rosaria Coduti, Policy Officer at the EC’s Directorate-General for Communications Networks, Content and Technology (DG CNECT), started by outlining the factors that drove the EC to put forward the EU Data Strategy. As the EC acknowledges the potential of data usage for the benefit of society, it intends to harness it through bolstering the functioning of the single market for data, with due regard to privacy, data protection and competition rules.
To attain that goal, there were questions that needed to be addressed to increase the trust in the exchange of such data. Those included a lack of clarity on the legal and technical requirements applicable to the re-use of data and the sharing of data by public bodies, as well as the creation of European-based data storage solutions. On the other hand, the EC also identified the need of further empowering data subjects through voluntary data sharing, as a complement to their data portability right under the General Data Protection Regulation (GDPR).
According to the EC official, the EU Data Strategy rests on 4 main pillars: 1) a cross-sectoral governance framework for boosting data access and use; 2) significant investments in European federated cloud infrastructures and interoperability; 3) empowering individuals and SMEs in the EU with digital skills and data literacy; 4) the creation of common European Data Spaces in crucial sectors and public interest domains, through data governance and practical arrangements.
The DGA itself, which intends to create trust in and democratise the data sharing ecosystem, also focuses on four main aspects: 1) the re-use of sensitive public sector data, by addressing “obstacles” to data sharing, complementing the EU’s Open Data Directive – together with an upcoming Implementing Act on High-Value Datasets under Article 14 of that Directive – and building on Member-States’ access regimes; 2) business-to-business (B2B) and consumer-to-business (C2B) data sharing through dedicated, neutral and duly notified service providers (“data intermediaries”); 3) data altruism, enabling individuals to share their personal data for the common good with registered non-profit dedicated organisations, notably through signing a standard to-be-developed European data altruism broad consent form; 4) the European Data Innovation Board, which shall be an expert group with EC secretariat, focused on technical standardisation and on harmonising practices around data re-use, data intermediaries and data altruism.
Specifically on the topic of data intermediaries, and replying to questions from the audience, Maria Rosaria Coduti mentioned the importance of ensuring they remain neutral. This entails that intermediaries should not be allowed to process the data they are entrusted with for their own purposes. In this respect, Recital 22 of the draft DGA excludes cloud service providers, and entities which aggregate, enrich or transform the data for subsequent sale (e.g., data brokers) from its scope of application.
It should be noted that the third Compromise Text released by the Portuguese Presidency of the Council of the EU opens the possibility for intermediaries to use the data for service improvement, as well as to offer “specific services to improve the usability of the data and ancillary services that facilitate the sharing of data, such as storage, aggregation, curation, pseudonymisation and anonymisation.”
Coduti underlined the importance of preventing any conflicts of interest for intermediaries. Data sharing services must thus be rendered through a legal entity separate from the other activities of the intermediary, notably when the latter is a commercial company. This would, in principle, mean that legal entities acting as intermediaries under the DGA would not be covered by the proposed DSA as hosting service providers, nor as “gatekeepers” under the DMA proposal. Ultimately, the EC wishes intermediaries to flourish, by becoming a trusted player and a valuable tool for businesses and individuals in the data economy.
The upcoming Data Act: facilitating B2B and B2G data sharing
Lastly, the EC official briefly addressed the upcoming EU Data Act. In parallel with the revision of the 25-year old Database Directive, this piece of legislation will focus on Business-to-Business (B2B) and business-to-government (B2G) data sharing. Its aim will be to maximise the use of data for innovation in different sectors and for evidence-based policymaking, without harming the interests of companies that invest in data generation (e.g., companies that produce smart devices and sensors).
While the Data Act will not address the issue of property rights over data, it will seek to strike down contractual and technical obstacles to data sharing and usage in industrial ecosystems (e.g., in the mobility, health, energy and agricultural spaces). Coduti stressed that the discussion around data ownership is complex, due to the proliferation of data obtained from IoT devices and the emergence of edge computing, as well as the necessary balance between keeping utility of datasets and safeguarding data subjects’ rights through anonymisation.
On the latter point, Rachel Ran, Data Policy Manager at the Israeli Government ICT Authority, echoed Coduti’s concerns, stating that data cannot be universally open. According to the Israeli official, there is a tradeoff between data utility and individual privacy that has to be accepted, but questions remain about the level of involvement that governments should have in determining this balance.
An increasingly complex digital regulatory framework in the EU
Henriette Tielemans, IAPP Senior Westin Research Fellow, offered a comprehensive overview of the EC’s data-related legislative proposals which were tabled in the last six months, other than the DGA. The trilogue work currently underway in the EU institutions on the proposed ePrivacy Regulation was also mentioned.
In the context of its Strategy for Artificial Intelligence (AI), the EC has very recently published a proposal for a Regulation laying down harmonised rules on AI. Tielemans saw this proposal as important and groundbreaking, suggesting that the EC is looking to set standards also beyond EU borders, like it did with the GDPR. She stressed that the proposal takes a cautious risk-based approach.
Tielemans highlighted the proposal’s dedicated provision (Article 5) on banned AI practices, arguing that the most contentious amongst those should be real-time remote biometric identification for law enforcement purposes in public spaces. However, she noted that the paragraph contained wide exceptions to the prohibition, allowing law enforcement authorities to use facial recognition in that context, subject to conditions. Tielemans predicted that the provision will be a “hot potato” during the Regulaiton’s negotiations, notably within the Council of the EU.
Furthermore, Tielemans stressed that “high-risk AI systems”, which are the major focus of the proposal, are not defined thereunder, as the EU intends to ensure the Regulation is future-proof. However, Annex III forwards a provisional list of high-risk AI systems, like systems used for educational and vocational training (e.g., to determine who will be admitted to a given school). Annex II is more complex, due to the interaction with other EU acts: if there are products subject to a third-party conformity assessment (under other EU laws) where the provider would like to integrate an AI component, that would be considered high-risk AI. Tielemans also noted that, once a system is qualified as high-risk, then providers acquire a number of obligations on training models, record keeping, among others.
On the DSA, Tielemans pointed out that the proposal is geared towards providers of online intermediary services. It provides rules on the liability of providers for third-party content and how they should conduct content moderation. In principle, she stressed, such providers shall not be liable for information which is conveyed or stored through their services, although they are burdened with takedown – but no general monitoring – obligations. The proposal distinguishes between different types of providers, with their respective obligations matching their role and their degree of importance in the online ecosystem. Hosting service providers have less obligations than online platforms, and the latter less so than very large ones: a sort of obligation “cascade”.
Lastly, Tielemans concisely mentioned the DMA and the revised Directive on Security of Network and Information Systems (NIS 2 Directive) as other noteworthy EC initiatives, identifying the former as a competition law toolbox and an “outlier” in the EU Data Strategy. She also pinpointed the latter’s broader scope, increased oversight and heavier penalties as important advances in the EU’s cybersecurity framework.
Reconciling “overarching” with “sectoral” regulation on data sharing
Helena Koning, Assistant General Counsel, Global Privacy Compliance Assurance & Europe Data Protection Officer at Mastercard, provided some thoughts about the draft DGA’s potential impact on the financial industry.
She started by outlining the actors involved in the data sharing ecosystem. These include: (i) individuals, who demand data protection and responsible use of data; (ii) businesses, that wish to innovate through data usage and insights drawn from data, notably by personalising their products and services; (iii) policymakers, who increasingly regulate data usage; and (iv) regulators with bolstered enforcement powers in this space.
Then, Koning stressed that companies in the financial sector are currently subject to a significant regulatory overlap when it comes to data collection and sharing, with the ePrivacy Directive applying to terminal equipment information, GDPR applying to personal data in general and the Second Payment Services Directive (PSD2) covering payment data sharing. While there is already guidance by the European Data Protection Board (EDPB) on the interplay between the GDPR and PSD2, Tielemans added that lawmakers tend to regulate in siloes, adopting overlapping and sometimes conflicting definitions and obligations. This results in financial sector players being pushed to wear very different hats under each framework (e.g. payment service providers and data controllers). In this regard, Tielemans said that the EC should place further efforts in ensuring consistency between EU acts before proposing new legislation.
Koning showed concern that instruments such as the DGA and the Data Act will add to this regulatory complexity and that SMEs and citizens will have a hard time complying with and understanding the new laws. On this point, she addressed the fact that the DGA and PSD2 have diverging models for fostering data-based innovation: as an illustration, while PSD2 mandates banks to share customer data with fintechs, free of charge and upon the customer’s contractual consent, the DGA centres around voluntary data sharing, for which public bodies may charge fees and data subjects are called to give GDPR-aligned consent.
Furthermore, Koning expressed doubts about the immediate benefit that data holders and subjects would get from sharing their data with intermediaries, often in exchange of service fees.
Alternatives to data sharing and focus on data insights
Dr. Thomas Hardjono,from MIT Connection Science & Engineering, develops research on using data to better understand and solve societal issues, including the spread of diseases and addressing inequality. Hardjono started by congratulating the direction taken by the EC with the DGA, stating that his group at MIT had been studying issues relating to commoditization of personal data since the publication of a 2011 World Economic Forum report. In Hardjono’s view, public data is a societal asset that should be treated as carefully and comprehensively as personal data.
On that point, Rachel Ranmentioned that governments should seek to encourage data sharing through data governance and to centre their policies around the needs of data subjects. She added that data products – like Application Programming Interfaces (APIs) – should be human-centered. Data should be seen as a product, but not a commodity, especially when it comes to sharing government data.
Ran continued by describing one of the Israeli ICT Authority’s major projects: creating standard APIs for G2G and G2B data sharing. But there are significant challenges to such tasks, including: (i) unstructured and fragmented data; (ii) duplicated records and gaps; and (iii) inconsistent data formats and definitions. This ultimately leads to suboptimal decision-making by government bodies, given they are not properly informed by accurate and updated data.
About data sharing services, Hardjono stated that data intermediaries regulated under the DGA may face specific hurdles, notably on the level of intelligibility of data conveyed to data users. There are questions on whether the draft DGA’s prohibition on intermediaries to aggregate and structure data could prevent them from developing services which are interesting for potential data users. Koning added that a number of data sharing collaborations are already in place and that new EU regulation should facilitate rather than prevent them.
On that topic, Hardjono mentioned communities would be more interested in accessing insights and statistics about their citizens’ activity (e.g., on transportation, infrastructure usage and spending patterns), rather than large sets of raw data. On the other hand, aggregated data could be publicly shared with the wider society.
As a solution, Hardjono proposed developing and making available Open Algorithms, allowing data users (e.g., a municipality) to access specific datasets of their interest and to directly ask questions to and obtain insights from data holders about such datasets, through APIs. This would also avoid moving the data around, by keeping it with data holders.
Then another question arises, according to Hardjono: given the commercial value of data insights, there should be business incentives, possibly via fair remuneration, to gather, structure and analyze the data. In that context, Hardjono stressed that clarifying the intermediaries’ business model is crucial and should be addressed by the DGA. He also suggested that a joint remuneration model, shared between the public sector and data users, could be devised. Moreover, this leads to novel doubts about data ownership, notably about who owns the insights (the holder or the intermediary?) and on what title: could they be considered as the provider’s intellectual property?
Upon the observation from Prof. Assaf Hamdani that some cities are now imposing or incentivising companies and citizens to share data through administrative procedures and contracts, Hardjono regretted that the DGA did not devote enough attention to so-called data cooperatives. While Article 9(1)(c) of the DGA does offer a description of the services that data cooperatives should offer to data subjects and SMEs (including assisting them in negotiating data processing terms), there is an extensive academic discussion in the US about other roles these cooperatives could play in defending citizens’ interests, that could feed into the DGA debates. On the issue of data cooperatives, Ran held that such cooperatives should address data subjects’ needs and share data for a specified purpose, praising the DGA model in that regard.
Lastly, Hardjono highlighted the fact that certain datasets may have implicit bias and that algorithms used to analyze such data may thus be implicitly biased. Therefore, he held that ensuring algorithmic auditing and fairness is key to achieving good societal results from the usage of the large volumes of data at the relevant players’ disposal.
Ran added that, besides trustworthy, data should also be discoverable, interoperable (with common technical standards) and self-describing, to facilitate its sharing and ensure its usefulness.
As federal lawmakers consider proposals for a federal baseline privacy law in the United States, one of the most complex challenges is federal preemption, or the extent to which a federal law should nullify the state laws on the books and the emerging laws addressing the collection and use of personal information.
Many recognize the benefits to businesses and consumers of establishing uniform national standards for the collection, transfer, and sale of commercial personal information, if those standards are strong and flexible enough to meet new challenges that arise. Such standards will require, to at least an extent, replacing individual state efforts. At the same time, however, there are hundreds of state privacy laws on the books. Many of these laws have a uniquely local character, such as laws governing student records, medical information, and library records. Preemption only becomes more complicated as additional states join recent leaders such as Virginia, California, and Colorado, to pass omnibus data privacy laws that apply to data collected across borders from websites, apps, and other digital services.
What can we learn from how existing federal privacy laws have addressed preemption? As a starting point, FPF staff have surveyed twelve (12) federal sectoral privacy laws passed between 1968-2003, and examined the extent to which they preempt similar state privacy laws. A comprehensive consumer privacy law would almost certainly preserve most of these sectoral laws and their state counterparts. They provide a useful insight into how Congress has addressed federal preemption in the past.
In surveying these 12 federal privacy laws, we observe a few notable features, and offer some thoughts (below) on what factors have influenced Congressional decisions about preemption:
Preemption is Not Binary. Federal preemption is not an “all or nothing,” or even a “floor or ceiling” feature of US laws. All federal laws in the United States preempt directly conflicting state and local laws, under the U.S. Constitution’s supremacy clause (Art. VI.2). Beyond direct conflicts, however, it is entirely up to Congress to decide the extent to which state laws will be permitted to complement the many different aspects of a federal framework. Thus, some laws preempt regulations over particular subject matters (FCRA); some preempt certain procedural standards while allowing local prohibitions or requirements for local conduct (TCPA); some establish federal minimum standards, while explicitly allowing a conflicting state law to supersede federal law in narrow circumstances (FERPA); some prohibit only “inconsistent” legal liability (COPPA); while still others establish fully preemptive, detailed and prescriptive regulations in which the federal government dominates a field (Cable Act). See the Discussion Draft for analysis of each law. There are many compliance factors that likely influence the case-by-case decisions Congress has made, which we discuss below.
Preemption of Definitions. At least one law (FCRA) establishes a preemptive national definition of a key term (“firm offer of credit or insurance”). FCRA provides that even in cases where a state law goes beyond federal requirements, the state is bound to use the federal definition of a key term, even in the interpretation of the state law provisions.
Agency Involvement. Several federal privacy laws explicitly authorize a relevant governing federal agency to make decisions regarding preemption of state laws, or to respond to petitions for clarification on whether a state law is preempted. For example, the HIPAA Privacy Rule contains detailed requirements for petitioners to request that a state law be expressly preserved from preemption by the Secretary of Health and Human Services. Similarly, the FCC has received numerous petitions over the years to clarify whether state telemarketing laws are preempted by TCPA. In other cases, an agency has weighed in less formally, such as when the FTC argued in an amicus brief that COPPA does not preempt state protections for teenagers. Based on this precedent, it is clear that a relevant federal agency can play a key role in assisting with challenging preemption decisions.
Factors Influencing Preemption Decisions
Given the case-by-case variability described above and in the Discussion Draft, what determines when and how Congress has chosen to preempt state and local regulations that overlap or supplement federal privacy laws?
Congress is a political body, and politics surely play a role. But our analysis suggests that Congress pursues an overall goal of balancing individual rights with practical business compliance. We suggest that Congress pursues those goals by weighing several factors aside from political considerations. This likely include, for example: (1) the existence of national consensus on harmful business practices (versus expected regional variation in what is considered harmful); (2) the comprehensiveness or prescriptive nature of the law; (3) the national versus localized nature of business practices; and (4) the localized nature of data (which is sometimes, but not always, related to identifiability of data).
For example, a key difference between the federal commercial emailing law, CAN-SPAM (very preemptive), and the federal commercial telemarketing law, the Telephone Consumer Protection Act (not preemptive except with respect to certain inter-state standards) is the relative ease with which the personal data being regulated can be localized, or have its geographic location readily inferred. Email addresses, despite being personal information, give no indication of the owner’s location, while residential phone numbers were straightforward to relate to a particular state when the law was drafted in 1991.
Thus, while differing state telemarketing laws can present compliance costs for marketing companies operating across state lines, such laws do not create impractical barriers to compliance. In addition, telemarketing represents an issue on which there may be much more regional variation than national consensus on appropriate local business practices: for example, some states ban political calls, some ban calls during certain times of day, and some maintain additional do-not-call registries (such as Texas’s do-not-call registry for businesses, to allow them to avoid commercial calls from electricity providers).
As a contrasting example, the Fair Credit Reporting Act (largely preemptive), in 1970 represented a strong national consensus on appropriate business practices applicable primarily to three dominant credit bureaus in the United States, all operating effectively nationwide. At the same time, credit reports are involved in relatively localized business practices and involve identifiable information from which location can usually be inferred (e.g. from home addresses). As a result, business compliance with different state standards may not have been impossible, but was perhaps, ultimately, not desirable due to the comprehensive and prescriptive nature of the law and the relative national consensus on appropriate norms for credit bureaus.
These factors are just some of the myriad considerations that we suggest may influence preemption decisions for a federal privacy law, if the goal is to balance consumer privacy interests against concerns about practical business compliance. Further research might include, for example, a review of Congressional histories, or learning from other, non-privacy federal laws. We welcome feedback on the Discussion Draft.
recast the conditions to obtain ‘safe harbour’ from liability for online intermediaries, and
unveiled an extensive regulatory regime for a newly defined category of online ‘publishers’, which includes digital news media and Over-The-Top (OTT) services.
The majority of these provisions were unanticipated, resulting in a raft of petitions filed in High Courts across the country challenging the validity of the various aspects of the Rules, including with regard to their constitutionality. On 25 May 2021, the three month compliance period on some new requirements for significant social media intermediaries (so designated by the Rules) expired, without many intermediaries being in compliance opening them up to liability under the Information Technology Act as well as wider civil and criminal laws. This has reignited debates about the impact of the Rules on business continuity and liability, citizens’ access to online services, privacy and security.
Following on FPF’s previous blog highlighting some aspects of these Rules, this article presents an overview of the Rules before deep-diving into critical issues regarding their interpretation and application in India. It concludes by taking stock of some of the emerging effects of these new regulations, which have major implications for millions of Indian users, as well as digital services providers serving the Indian market.
1.Brief overview of the Rules: Two new regimes for ‘intermediaries’ and ‘publishers’
The new Rules create two regimes for two different categories of entities: ‘intermediaries’ and ‘publishers’. Intermediaries have been the subject of prior regulations – the Information Technology (Intermediaries guidelines) Rules, 2011 (the 2011 Rules), now superseded by these Rules. However, the category of “publishers” and related regime created by these Rules did not previously exist.
The Rules begin with commencement provisions and definitions in Part I. Part II of the Rules apply to intermediaries (as defined in the Information Technology Act 2000 (IT Act)) who transmit electronic records on behalf of others, and includes online intermediary platforms (like Youtube, Whatsapp, Facebook). The rules in this part primarily flesh out the protections offered in Section 79 of India’s Information Technology Act 2000 (IT Act), which give passive intermediaries the benefit of a ‘safe harbour’ from liability for objectionable information shared by third parties using their services — somewhat akin to protections under section 230 of the US Communications Decency Act. To claim this protection from liability, intermediaries need to undertake certain ‘due diligence’ measures, including informing users of the types of content that could not be shared, and content take-down procedures (for which safeguards evolved overtime through important case law). The new Rules supersede the 2011 Rules and also significantly expand on them, introducing new provisions and additional due diligence requirements that are detailed further in this blog.
Part III of the Rules apply to a new previously non-existent category of entities designated to be ‘publishers‘. This is further classified into subcategories of ‘publishers of news and current affairs content’ and ‘publishers of online curated content’. Part III then sets up extensive requirements for publishers to adhere to specific codes of ethics, onerous content take-down requirements and three-tier grievance process with appeals lying to an Executive Inter-Departmental Committee of Central Government bureaucrats.
Finally, the Rules contain two provisions that apply to all entities (i.e. intermediaries and publishers) relating to content-blocking orders. They lay out a new process by which Central Government officials can issue directions to delete, modify or block content to intermediaries and publishers, either following a grievance process (Rule 15) or including procedures of “emergency”blocking orders which may be passed ex-parte. These Rules stem from powers to issue directions to intermediaries to block public access of any information through any computer resource (Section 69A of the IT Act). Interestingly, these provisions have been introduced separately from the existing rules for blocking purposes called the Information Technology (Procedure and Safeguards for Blocking for Access of Information by Public) Rules, 2009.
2.Key issues for intermediaries under the Rules
2.1 A new class of ‘social media intermediaries‘
The term ‘intermediary’ is a broadly defined term in the IT Act covering a range of entities involved in the transmission of electronic records. The Rules introduce two new sub-categories, being:
“social media intermediary” defined (in Rule 2(w)) as one who “primarily or solely enables online interaction between two or more users and allows them” to exchange information; and
“significant social media intermediary” (SSMI) comprising social media intermediaries with more than five million registered users in India (following this Government notification of the threshold).
Given that a popular messaging app like Whatsapp has over 400 million users in India, the threshold appears to be fairly conservative. The Government may order anyintermediary to comply with the same obligations as SSMIs (under Rule 6) if their services are adjudged to pose a risk of harm to national security, the sovereignty and integrity of India, India’s foreign relations or to public order.
SSMIs have to follow substantially more onerous “additional due diligence” requirements to claim the intermediary safe harbour (including mandatory traceability of message originators, and proactive automated screening as discussed below). These new requirements raise privacy concerns and data security concerns, as they extend beyond the traditional ideas of platform “due diligence”, they potentially expose content of private communications and in doing so create new privacy risks for users in India.
Extensive new requirements are set out in the new Rule 4 for SSMIs.
In-country employees: SSMIs must appoint in-country employees as (1) Chief Compliance Officer, (2) a nodal contact person for 24×7 coordination with law enforcement agencies and (3) a Resident Grievance Officer specifically responsible for overseeing the internal grievance redress mechanism. Monthly reporting of complaints management is also mandated.
Traceability requirements for SSMIs providing messaging services: Among the most controversial requirements is Rule 4(2) which requires SSMIs providing messaging services to enable the identification of the “first originator” of information on their platforms as required by Government or court orders. This tracing and identification of users is considered incompatible with end-to-end encryption technology employed by messaging applications like Whatsapp and Signal. In their legal challenge to this Rule, Whatsapp has noted that end-to-end encrypted platforms would need to be re-engineered to identify all users since there is no way to predict which user will be the subject of an order seeking first originator information.
Provisions to mandate modifications to the technical design of encrypted platforms to enable traceability seem to go beyond merely requiring intermediary due diligence. Instead they appear to draw on separate Government powers relating to interception and decryption of information (under Section 69 of the IT Act). In addition, separate stand-alone rules laying out procedures and safeguards for such interception and decryption orders already exist in the Information Technology (Procedure and Safeguards for Interception, Monitoring and Decryption of Information) Rules, 2009. Rule 4(2) even acknowledges these provisions–raising the question of whether these Rules (relating to intermediaries and their safe harbours) can be used to expand the scope of section 69 or rules thereunder.
Proceedings initiated by Whatsapp LLC in the Delhi High Court, and Free and Open Source Software (FOSS) developer Praveen Arimbrathodiyil in the Kerala High Court have both challenged the legality and validity of Rule 4(2) on grounds including that they are ultra vires and go beyond the scope of their parent statutory provisions (s. 79 and 69A) and the intent of the IT Act itself. Substantively, the provision is also challenged on the basis that it would violate users’ fundamental rights including the right to privacy, and the right to free speech and expression due to the chilling effect that the stripping back of encryption will have.
Automated content screening: Rule 4(4) mandates that SSMIs must employ technology-based measures including automated tools to proactively identify information depicting (i) rape, child sexual abuse or conduct, or (ii) any information previousy removed following a Government or court order. The latter category is very expansive and allows content take-downs for a broad range of reasons including defamatory or pornographic content, to IP infringements, to content threatening national security or public order (as set out in Rule 3(1)(d)).
Though the objective of the provision is laudable (i.e. to limit the circulation of violent or previously removed content), the move towards proactive automated monitoring has raised serious concerns regarding censorship on social media platforms. Rule 4(4) appears to acknowledge the deep tensions that this requirement raises with privacy and free speech concerns, as seen by the provisions that require these screening measures to be proportionate to the free speech and privacy of users, to be subject to human oversight, and reviews of automated tools to assess fairness, accuracy, propensity for bias or discrimination, and impact on privacy and security. However, given the vagueness of this wording compared to the trade-off of losing intermediary immunity, scholars and commentators are noting the obvious potential for ‘over-compliance’ and excessive screening out of content. Many (including the petitioner in the Praveen Arimbrathodiyil matter) have also noted that automated filters are not sophisticated enough to differentiate between violent unlawful images and legitimate journalistic material. The concern is that such measures could create a large-scale screening out of ‘valid’ speech and expression, with serious consequences for constitutional rights to free speech and expression which also protect ‘the rights of individuals to listen, read and receive the said speech‘ (Tata Press Ltd v. Mahanagar Telephone Nigam Ltd, (1995) 5 SCC 139).
Tighter timelines for grievance redress, content take down and information sharing with law enforcement:Rule 3 includes enhanced requirements to serve privacy policies and user agreements outlining the terms of use, including annual reminders of these terms and any modifications and of the intermediaries’ right to terminate the user’s access for using the service in contravention of these terms. The Rule also has enhanced grievance redress processes for intermediaries, by expanding these requirements to mandate that the complaints system acknowledge complaints within 24 hours, and dispose of them in 15 days. In the case of certain categories of complaints (where a person complains of inappropriate images or impersonations of them being circulated), the removal of access to the material is mandated within 24 hours based on a prima facie assessment.
Such requirements appear to be aimed at creating more user-friendly networks of intermediaries. However, the imposition of a single set of requirements is especially onerous for smaller or volunteer-run intermediary platforms which may not have income streams or staff to provide for such a mechanism. Indeed, the petition in the Praveen Arimbrathodiyil matter has challenged certain of these requirements as being a threat to the future of the volunteer-led Free and Open Source Software (FOSS) movement in India, by placing similar requirements on small FOSS initiatives as on large proprietary Big Tech intermediaries.
Other obligations that stipulate turn-around times for intermediaries include (i) a requirement to remove or disable access to content within 36 hours of receipt of a Government or court order relating the unlawful information on the intermediary’s computer resources (under Rule 3(1)(d)) and (ii) to provide information within 72 hours of receiving an order from a authorised Government agency undertaking investigative activity (under Rule 3(1)(j).
Similar to the concerns with automated screening, there are concerns that the new grievance process could lead to private entities becoming the arbiters of appropriate content/ free speech — a position that was specifically reversed in a seminal 2015 Supreme Court decision that clarified that a Government or Court order was needed for content-takedowns.
3. Key issues for the new ‘publishers’ subject to the Rules, including OTT players
3.1New Codes of Ethics and three-tier redress and oversight system for digital news media and OTT players
Digital news media and OTT players have been designated as ‘publishers of news and current affairs content’ and ‘publishers of online curated content’ respectively in Part III of the Rules. Each category has been then subjected to separate Codes of Ethics. In the case of digital news media, the Codes applicable to the newspapers and cable television have been applied. For OTT players, the Appendix sets out principles regarding content that can be created and display classifications. To enforce these codes and to address grievances from the public on their content, publishers are now mandated to set up a grievance system which will be the first tier of a three-tier “appellate” system culminating in an oversight mechanism by the Central Government with extensive powers of sanction.
Some of the key issues emerging from these Rules in Part III and the challenges to them are highlighted below.
3.2 Lack of legal authority and competence to create these Rules
There has been substantial debate on the lack of clarity regarding the legal authority of the Ministry of Electronics & Information Technology (MeitY) under the IT Act. These concerns arise at various levels.
Authority and competence to regulate ‘publishers’ of original content is unclear: The definition of ‘intermediary’ in the IT Act does not extend to cover types of entities defined to be publishers. The Rules themselves acknowledge that ‘publishers’ are a new category of regulated entity created by the Rules, as opposed to a sub-category of intermediaries. Further, the commencement of the Rules also confirm that they are passed under statutory provisions in the IT Act related to intermediary regulation. It is a well established principle that subordinate rules cannot go beyond the object and scope of parent statutory provisions (Ajoy Kumar Banerjee v Union of India (1984) 3 SCC 127). Consequently, the authority of MeitY to regulate entities that create original content – like online news sources and OTT platforms – remains unclear at best.
Ability to extend substantive provisions in other statutes through the Rules: The Rules apply two codes of conduct to digital publishers of news and current affairs content, namely (i) the Norms of Journalistic Conduct of the Press Council of India under the Press Council Act, 1978; (ii) Programme Code under section 5 of the Cable Television Networks Regulation) Act, 1995. Many, including petitioners in the LiveLaw matterhave noted that the power to make Rules under the IT Act’s s 87 cannot be used to extend or expand requirements under other statutes and their subordinate rules. To bring digital news media or OTT players into existing regulatory regimes for the Press and television broadcasting, amendments to those regimes will be required led by the Ministry of Information and Broadcasting.
Validity of three-tier ‘quasi-judicial’ adjudicatory mechanism, with final appeal to Committee of solely executive functionaries: Rules 11 – 14 create a three-tier grievance and oversight system which can be used by any person with a grievance against content published by any publisher. Under this model, any person having a grievance with any material published by a publisher can complain through the publisher’s redress process. If any grievance is not satisfactorily dealt with by the publisher entity (Level I) in 15 days, it will be escalated to the self regulatory body of which the publisher is a member (Level II) which must also provide a decision to the complainant within 15 days. If the complainant is unsatisfied, they may appeal to the Oversight Mechanism (in Level III). This can be appreciated as an attempt to create feedback loops that can minimise the spread of misleading or incendiary media, disinformation etc through a more effective grievance mechanism. The structure and design of the three-tier structure have however raised specific concerns.
First, there is a concern that Level I & II result in a privatisation of adjudications relating to free speech and expression of creative content producers – which would otherwise be litigated in Courts and Tribunals as matters of free speech. As noted by many (including the LiveLaw petition at page 33), this could have the effect of overturning judicial precedent in Shreya Singhal v. Union of India ((2013) 12 S.C.C. 73) that specifically read down s 79 of the IT Act to avoid a situation where private entities were the arbiters determining the legitimacy of takedown orders. Second, despite referring to “self-regulation” this system is subject to executive oversight (unlike the existing models for offline newspapers and broadcasting).
The Inter-Departmental Committee is entirely composed of Central Government bureaucrats, and it may review complaints through the three-tier system or referred directly by the Ministry following which it can deploy a range of sanctions from warnings, to mandating apologies, to deleting, modifying or blocking content. This also raises the question of whether this Committee meets the legal requirements for any administrative body undertaking a ‘quasi-judicial’ function, especially one that may adjudicate on matters of rights relating to free speech and privacy. Finally, while the objective of creating some standards and codes for such content creators may be laudable it is unclear whether such an extensive oversight mechanism with powers of sanction on online publishers can be validly created under the rubric of intermediary liability provisions.
4. New powers to delete, modify or block information for public access
As described at the start of this blog, the Rules add new powers for the deletion, modification and blocking of content from intermediaries and publishers. While section 69A of the IT Act (and Rules thereunder) do include blocking powers for Government, they only exist vis a vis intermediaries. Rule 15 also expands this power to ‘publishers’. It also provides a new avenue for such orders to intermediaries, outside of the existing rules for blocking information under the Information Technology (Procedure and Safeguards for Blocking for Access of Information by Public) Rules, 2009.
More grave concerns arise from Rule 16 which allows for the passing of emergency orders for blocking information, including without giving an opportunity of hearing for publishers or intermediaries. There is a provision for such an order to be reviewed by the Inter-Departmental Committee within 2 days of its issue.
Both Rule 15 and 16 apply to all entities contemplated in the Rules. Accordingly, they greatly expand executive power and oversight over digital media services in India, including social media, digital news media and OTT on-demand services.
5. Conclusions and future implications
The new Rules in India have opened up deep questions for online intermediaries and providers of digital media services serving the Indian market.
For intermediaries, this creates a difficult and even existential choice: the requirements, (especially relating to traceability and automated screening) appear to set an improbably high bar given the reality of their technical systems. However, failure to comply will result in not only the loss of a safe harbour from liability — but as seen in new Rule 7, also opens them up to punishment under the IT Act and criminal law in India.
For digital news and OTT players, the consequences of non-compliance and the level of enforcement remain to be understood, especially given open questions regarding the validity of legal basis to create these rules. Given the numerous petitions filed against these Rules, there is also substantial uncertainty now regarding the future although the Rules themselves have the full force of law at present.
Overall, it does appear that attempts to create a ‘digital media’ watchdog would be better dealt with in a standalone legislation, potentially sponsored by the Ministry of Information and Broadcasting (MIB) which has the traditional remit over such areas. Indeed, the administration of Part III of the Rules has been delegated by MeitY to MIB pointing to the genuine split in competence between these Ministries.
Finally, the potential overlaps with India’s proposed Personal Data Protection Bill (if passed) also create tensions in the future. It remains to be seen if the provisions on traceability will survive the test of constitutional validity set out in India’s privacy judgement (Justice K.S. Puttaswamy v. Union of India, (2017) 10 SCC 1). Irrespective of this determination, the Rules appear to have some dissonance with the data retention and data minimisation requirements seen in the last draft of the Personal Data Protection Bill, not to mention other obligations relating to Privacy by Design and data security safeguards. Interestingly, despite the Bill’s release in December 2019, a definition for ‘social media intermediary’ that it included in an explanatory clause to its section 26(4) closely track the definition in Rule 2(w), but also departs from it by carving out certain intermediaries from the definition. This is already resulting in moves such as Google’s plea on 2 June 2021 in the Delhi High Court asking for protection from being declared a social media intermediary.
These new Rules have exhumed the inherent tensions that exist within the realm of digital regulation between goals of the freedom of speech and expression, and the right to privacy and competing governance objectives of law enforcement (such as limiting the circulation of violent, harmful or criminal content online) and national security. The ultimate legal effect of these Rules will be determined as much by the outcome of the various petitions challenging their validity, as by the enforcement challenges raised by casting such a wide net that covers millions of users and thousands of entities, who are all engaged in creating India’s growing digital public sphere.
New FPF Report Highlights Privacy Tech Sector Evolving from Compliance Tools to Platforms for Risk Management and Data Utilization
As we enter the third phase of development of the privacy tech market, purchasers are demanding more integrated solutions, product offerings are more comprehensive, and startup valuations are higher than ever, according to a new report from the Future of Privacy Forum and Privacy Tech Alliance. These factors are leading to companies providing a wider range of services, acting as risk management platforms, and focusing on support of business outcomes.
“The privacy tech sector is at an inflection point, as its offerings have expanded beyond assisting with regulatory compliance,” said FPF CEO Jules Polonetsky. “Increasingly, companies want privacy tech to help businesses maximize the utility of data while managing ethics and data protection compliance.”
According to the report, “Privacy Tech’s Third Generation: A Review of the Emerging Privacy Tech Sector,” regulations are often the biggest driver for buyers’ initial privacy tech purchases. Organizations also are deploying tools to mitigate potential harms from the use of data. However, buyers serving global markets increasingly need privacy tech that offers data availability and control and supports its utility, in addition to regulatory compliance.
The report finds the COVID-19 pandemic has accelerated global marketplace adoption of privacy tech as dependence on digital technologies grows. Privacy is becoming a competitive differentiator in some sectors, and TechCrunch reports that 200+ privacy startups have together raised more than $3.5 billion over hundreds of individual rounds of funding.
“The customers buying privacy-enhancing tech used to be primarily Chief Privacy Officers,” said report lead author Tim Sparapani. “Now it’s also Chief Marketing Officers, Chief Data Scientists, and Strategy Officers who value the insights they can glean from de-identified customer data.”
The report highlights five trends in the privacy enhancing tech market:
Buyers desire “enterprise-wide solutions.”
Buyers favor integrated technologies.
Some vendors are moving to either collaborate and integrate or provide fully integrated solutions themselves.
Data is the enterprise asset.
Jurisdiction impacts a shared vernacular problem.
The report also draws seven implications for competition in the market:
Buyers favor integrated solutions over one-off solutions.
Collaborations, partners, cross-selling, and joint ventures between privacy tech vendors are increasing to provide buyers integrated suites of services and to attract additional market share.
Private equity and private equity-backed companies will continue their “roll-up” strategies of buying niche providers to build a package of companies to provide the integrated solutions buyers favor.
Venture capital will continue funding the privacy tech sector, though not every seller has the same level of success fundraising.
Big companies may acquire strategically valuable, niche players.
Small startups may struggle to gain market traction absent a truly novel or superb solution.
Buyers will face challenges in future-proofing their privacy strategies.
The report makes a series of recommendations, including that the industry define as a priority a common vernacular for privacy tech; set standards for technologies in the “privacy stack” such as differential privacy, homomorphic encryption, and federated learning; and explore the needs of companies for privacy tech based upon their size, sector, and structure. It calls on vendors to recognize the need to provide adequate support to customers to increase uptake and speed time from contract signing to successful integration.
The Future of Privacy Forum launched the Privacy Tech Alliance (PTA) as a global initiative with a mission to define, enhance and promote the market for privacy technologies. The PTA brings together innovators in privacy tech with customers and key stakeholders.
Members of the PTA Advisory Board, which includes Anonos, BigID, D-ID, Duality, Ethyca, Immuta, OneTrust, Privacy Analytics, Privitar, SAP, Truata, TrustArc, Wirewheel, and ZL Tech, have formed a working group to address impediments to growth identified in the report. The PTA working group will define a common vernacular and typology for privacy tech as a priority project with chief privacy officers and other industry leaders who are members of FPF. Other work will seek to develop common definitions and standards for privacy-enhancing technologies such as differential privacy, homomorphic encryption, and federated learning and identify emerging trends for venture capitalists and other equity investors in this space. Privacy Tech companies can apply to join the PTA by emailing [email protected].
Perspectives on the Privacy Tech Market
Quotes from Members of the Privacy Tech Alliance Advisory Board on the Release of the “Privacy Tech’s Third Generation” Report
“The ‘Privacy Tech Stack’ outlined by the FPF is a great way for organizations to view their obligations and opportunities to assess and reconcile business and privacy objectives. The Schrems II decision by the Court of Justice of the European Union highlights that skipping the second ‘Process’ layer can result in desired ‘Outcomes’ in the third layer (e.g., cloud processing of, or remote access to, cleartext data) being unlawful – despite their global popularity – without adequate risk management controls for decentralized processing.” — Gary LaFever, CEO & General Counsel, Anonos
“As a founding member of this global initiative, we are excited by the conclusions drawn from this foundational report – we’ve seen parallels in our customer base, from needing an enterprise-wide solution to the rich opportunity for collaboration and integration. The privacy tech sector continues to mature as does the imperative for organizations of all sizes to achieve compliance in light of the increasingly complicated data protection landscape.’’—Heather Federman, VP Privacy and Policy at BigID
“There is no doubt of the massive importance of the privacy sector, an area which is experiencing huge growth. We couldn’t be more proud to be part of the Privacy Tech Alliance Advisory Board and absolutely support the work they are doing to create alignment in the industry and help it face the current set of challenges. In fact we are now working on a similar initiative in the synthetic media space to ensure that ethical considerations are at the forefront of that industry too.” — Gil Perry, Co-Founder & CEO, D-ID
“We congratulate the Future of Privacy Forum and the Privacy Tech Alliance on the publication of this highly comprehensive study, which analyzes key trends within the rapidly expanding privacy tech sector. Enterprises today are increasingly reliant on privacy tech, not only as a means of ensuring regulatory compliance but also in order to drive business value by facilitating secure collaborations on their valuable and often sensitive data. We are proud to be part of the PTA Advisory Board, and look forward to contributing further to its efforts to educate the market on the importance of privacy-tech, the various tools available and their best utilization, ultimately removing barriers to successful deployments of privacy-tech by enterprises in all industry sectors” — Rina Shainski, Chairwoman, Co-founder, Duality
“Since the birth of the privacy tech sector, we’ve been helping companies find and understand the data they have, compare it against applicable global laws and regulations, and remediate any gaps in compliance. But as the industry continues to evolve, privacy tech also is helping show business value beyond just compliance. Companies are becoming more transparent, differentiating on ethics and ESG, and building businesses that differentiate on trust. The privacy tech industry is growing quickly because we’re able to show value for compliance as well as actionable business insights and valuable business outcomes.” — Kabir Barday, CEO, OneTrust
“Leading organizations realize that to be truly competitive in a rapidly evolving marketplace, they need to have a solid defensive footing. Turnkey privacy technologies enable them to move onto the offense by safely leveraging their data assets rapidly at scale.” — Luk Arbuckle, Chief Methodologist, Privacy Analytics
“We appreciate FPF’s analysis of the privacy tech marketplace and we’re looking forward to further research, analysis, and educational efforts by the Privacy Tech Alliance. Customers and consumers alike will benefit from a shared understanding and common definitions for the elements of the privacy stack.” — Corinna Schulze, Director, EU Government Relations, Global Corporate Affairs, SAP
“The report shines a light on the evolving sophistication of the privacy tech market and the critical need for businesses to harness emerging technologies that can tackle the multitude of operational challenges presented by the big data economy. Businesses are no longer simply turning to privacy tech vendors to overcome complexities with compliance and regulation; they are now mapping out ROI-focused data strategies that view privacy as a key commercial differentiator. In terms of market maturity, the report highlights a need to overcome ambiguities surrounding new privacy tech terminology, as well as discrepancies in the mapping of technical capabilities to actual business needs. Moving forward, the advantage will sit with those who can offer the right blend of technical and legal expertise to provide the privacy stack assurances and safeguards that buyers are seeking – from a risk, deployment and speed-to-value perspective. It’s worth noting that the growing importance of data privacy to businesses sits in direct correlation with the growing importance of data privacy to consumers. Trūata’s Global Consumer State of Mind Report 2021 found that 62% of global consumers would feel more reassured and would be more likely to spend with companies if they were officially certified to a data privacy standard. Therefore, in order to manage big data in a privacy-conscious world, the opportunity lies with responsive businesses that move with agility and understand the return on privacy investment. The shift from manual, restrictive data processes towards hyper automation and privacy-enhancing computation is where the competitive advantage can be gained and long-term consumer loyalty—and trust— can be retained.” — Aoife Sexton, Chief Privacy Officer and Chief of Product Innovation, Trūata
“As early pioneers in this space, we’ve had a unique lens on the evolving challenges organizations have faced in trying to integrate technology solutions to address dynamic, changing privacy issues in their organizations, and we believe the Privacy Technology Stack introduced in this report will drive better organizational decision-making related to how technology can be used to sustainably address the relationships among the data, processes, and outcomes.” — Chris Babel, CEO, TrustArc
“It’s important for companies that use data to do so ethically and in compliance with the law, but those are not the only reasons why the privacy tech sector is booming. In fact, companies with exceptional privacy operations gain a competitive advantage, strengthen customer relationships, and accelerate sales.” — Justin Antonipillai, Founder & CEO, Wirewheel
Colorado Privacy Act Passes Legislature: Growing Inconsistencies Ramp Up Pressure for Federal Privacy Law
Today, the Colorado Senate approved the House version of the Colorado Privacy Act (SB21-190) that passed yesterday, on June 7. If approved by Governor Jared Polis, Colorado will follow Virginia and California as the third U.S. state to establish baseline legal protections for consumer privacy.
“Although the Colorado Privacy Act contains notable advances that build on California and Virginia — in particular, formalizing a global privacy control, and applying to non-profit organizations — there continues to be an urgent need for Congress to set federal standards that create baseline nationwide protections for all.”
Statement by Polly Sanderson, Policy Counsel, Future of Privacy Forum
Colorado’s law features elements of both Virginia and California’s consumer privacy laws, as well as some elements unique to Colorado. The law is the first in the U.S. to apply to non-profit entities in addition to commercial entities. It contains a strong consent standard to process personal data for incompatible secondary uses and to process sensitive data such as health information, race, ethnicity, and other sensitive categories. The bill prohibits controllers from employing so-called “dark patterns” to obtain consent and allows consumers to exercise their opt-out rights via authorized agents. Consumers will be able to express their intent to opt-out of sales and targeted advertising via a universal opt-out mechanism established by the Colorado Attorney General, who is also granted authority to issue opinion letters and interpretive guidance on what constitutes a violation of the Act.
Similar to Virginia’s recently passed Consumer Data Protection Act, Colorado’s law requires controllers to conduct data protection assessments for processing activities that present a heightened risk of harm to a consumer. This, along with FIPPs-inspired data minimization and purpose specification provisions, promotes organizational accountability and moves beyond a notice and consent framework. By excluding de-identified data from the scope of personal data and excluding pseudonymous data from the rights of access, correction, deletion, and portability, the law follows existing standards and incentivizes covered entities to maintain data in less identifiable formats.
As a growing number of states begin to pass their own consumer privacy laws, concerns about interoperability may begin to emerge. For instance, definitional differences regarding what constitutes sensitive data, pseudonymous data, and biometric data may present operational challenges for businesses. Similarly, the scope of access, deletion, and other consumer rights differ between Colorado, Virginia, and California, creating potential implementation challenges. Finally, the research exemptions of each of these laws differ in their flexibility, consent, and oversight requirements.
Media Inquiries: Polly Sanderson, Senior Counsel at [email protected]
Privacy Trends: Four State Bills to Watch that Diverge from California and Washington Models
During 2021, state lawmakers have proposed a range of models to regulate consumer privacy and data protection.
As the first state to pass consumer privacy legislation in 2018, California established a highly influential model with the California Consumer Privacy Act. In the years since, other states have introduced dozens of nearly identical CCPA-like state bills. In 2019, the Washington Privacy Act became an alternative model, which also saw large numbers of nearly identical WPA-like state bills introduced in other states throughout 2019-2021. In February, 2021, the passage of the Virginia Consumer Data Protection Act cemented the Washington model as an influential alternative framework.
In 2021, however, numerous divergent frameworks have begun to emerge, with the potential to establish strong consumer protections, conflict with other states, and potentially influence federal privacy law. These proposals diverge from the California and Washington models in key ways, and are worth examining because of how they show ongoing cross-pollination, reveal concerns driving lawmakers about the inadequacy of notice and choice frameworks, and offer novel approaches for lawmakers and other stakeholders to discuss, debate, and consider.
The California Model
As the first state to enact consumer privacy legislation in 2018, California has a distinct and highly influential model for consumer privacy law. Since the passage of the California Consumer Privacy Act (CCPA), a proliferation of state proposals have adopted a similar framework, scope, and terminology. This reflects a general desire among state legislators to provide their constituents with at least the same privacy rights as those afforded to Californians, but in 2018, many hadn’t yet conceptualized alternative frameworks of their own.
California-style proposals adopt “business-service provider” terminology, focus on consumer-business relationships, and are characterized by their focus on providing consumers with greater transparency and control over their personal data. They feature a bundle of privacy rights, including the right for consumers to “opt-out” of sales (or sharing) of personal data, and require businesses to post “Do Not Sell” links on their website homepages. Often, California-style proposals also include provisions which aim to make it easier for consumers to exercise their opt-out rights, such as authorized agent and universal opt-out provisions.
Though none have passed into law, the California model has influenced many state proposals over the past three years, such as Alaska’s failed HB 159 / SB 116, Florida’s failed HB 969 / SB 1734, and New York Governor Cuomo’s failed Data Accountability and Transparency Act incorporated into Budget Legislation. Oklahoma’s failed HB 1602 also adopted a similar framework, though it would require businesses to obtain “opt-in” consent to sell personal data, rather than “opt-out.”
The Washington Model
The Washington Privacy Act (WPA – SB 5062), sponsored by Sen. Reuven Carlyle (D), recently failed for the third consecutive legislative session. However, in February, 2021, Virginia passed legislation which follows the general framework of the WPA. Virginia’s Consumer Data Protection Act (VA-CDPA) sponsored by Delegate Cliff Hayes (D) and Sen. David Marsden (D), will become effective on January 1, 2023.
The framework includes (1) processor/controller terminology; (2) heightened privacy protections for sensitive data; (3) individual rights of access, deletion, correction, portability, and the right to opt-out of sales, targeted advertising, and profiling in the furtherance of legal (or similarly significant) decisions; (4) differential treatment of pseudonymous data, (5) data protection impact assessments for high risk data processing activities, (6) flexibility for research, and (7) enforcement by the Attorney General.
Numerous other active state bills adopt this framework, such as Colorado SB21-190, Connecticut SB 893, and failed proposals in Utah, Minnesota, and elsewhere. The Colorado and Connecticut proposals are both on the Senate floor calendars in their respective states. Of course, each WPA-type bill contains important differences. For instance, the Colorado and Connecticut proposals both broadly exclude pseudonymous data from all consumer rights, including opt-out rights. The Colorado proposal also features a global/universal opt-out provision for sales and targeted advertising, an opt-out standard for the processing of sensitive data (rather than opt-in), a prescriptive HIPAA de-identification standard (rather than the FTC’s 3-part test), and public research exemptions that do not incorporate provisions mandating oversight by an institutional review board (IRB) or a similar oversight body.
Growing Divergence and Cross-Pollination
In the three years since the passage of the CCPA, legislative divergence has increased as more and more states have convened task forces to study consumer privacy issues, and held hearings, roundtables, and 1-on-1’s with diverse experts from academia, the advocacy community, and industry. In other words, the laboratories of democracy have been experimenting — a trend which will likely continue in 2022 and beyond as legislators’ views on consumer privacy continue to become more sophisticated and nuanced.
State bills in 2021, as compared to 2019-2020, are increasingly focused on bolstering notice and choice regimes (including a shift towards more “opt-in” rather than “opt-out” requirements), are borrowing more features from other laws (such as the GDPR’s “legitimate interests” framework), and in some cases experimenting with novel approaches (such as fiduciary duties, or “data streams”).
For example, some state bills would require businesses to provide two-tiered short-form and long-form disclosures, and would authorize a government agency to develop a uniform logo or button to promote individual awareness of the short-form notice. Numerous proposals would generally require opt-in consent for all data processing, would prohibit manipulative user interfaces to obtain consent, and would designate user-enabled privacy controls as a valid mechanism for an individual to communicate their privacy preferences. Some proposals feature additional rights, such as the right not to be subject to “surreptitious surveillance,” the right not to be subject to a solely automated decision, and the right to object to processing.
There is also a trend among proposals towards moving beyond a notice and choice framework, with the aim of moving the burden of privacy management away from individuals. For instance, many include strong purpose specification and data minimization requirements, and some include outright prohibitions on discriminatory data processing. At least one state (NJ A. 3283, discussed below) has taken inspiration from the EU’s General Data Protection Regulation (GDPR) by recognizing “legitimate interests” along with other lawful bases for data processing.
Many proposals are taking novel or unique approaches to privacy legislation. For example, a Texas proposal leans towards conceptualizing personal data as property by enabling an individual to exchange their “data stream” as consideration for a contract with a business. Meanwhile, various proposals contain duties of loyalty, care, and confidentiality. These trust-based duties were first introduced into US legislation in 2018, when Sen. Brian Schatz (D-HI) introduced the Data Care Act (S. 2961). At that time, it wasn’t clear whether trust-based duties would become influential in the US. The fact that they have demonstrates the potential for cross-pollination between federal and state proposals.
Four Notable Models to Watch
Amidst such a large volume of California and Washington-like bills, it may be easy to miss the handful of states where legislators are taking a different approach to baseline or comprehensive privacy legislation. Even if they do not pass, these bills are worth examining because they could eventually influence federal privacy law. Additionally, they can provide insights into some of the most pressing concerns of policymakers, such as whether (and how) to regulate automated decision-making, including profiling? Whether a framework should be based on privacy self-management, relationships of trust, civil rights, or personal data as property? How personal data should be defined, and whether it should be subcategorized according to sensitivity, identifiability, source (first party, third party, derived) or something else? Answering these types of questions is not straightforward, and there are many reasonable philosophical positions for stakeholders to take. Close attention to legislative proposals can help to promote nuanced dialogue and debate about the relative merits and drawbacks of different approaches.
Four active bills that are worth watching are (1) the New York Privacy Act (NYPA – S. 6701), (2) the New Jersey Disclosure and Accountability Transparency Act (NJ DaTA – A. 3283), (3) the Massachusetts Information Privacy Act (MIPA – S.46), and (4) Texas’s HB 3741.
New York Privacy Act (NYPA)
The New York Privacy Act (NYPA) (S. 6701) introduced by Sen. Kevin Thomas in May, 2021, has several distinctive features, such as an opt-in consent framework, duties of loyalty and care, heightened protections for certain types of consequential automated decision-making, and a data broker registry. The proposal passed out of the Consumer Protection Committee on May 18, and is now on the floor calendar. The legislature adjourns June 10.
Opt-In Consent & Global Consent Mechanism: The requirement for controllers to obtain opt-in consent to process a consumer’s personal data is accompanied by a prohibition on manipulative user interface design choices in order to obtain consent. Controllers would also be required to treat user-enabled privacy controls (e.g., browser plug-ins, device settings, or other mechanisms) that communicate or signal the consumer’s choice not to be subject to targeted advertising or the sale of their personal data as a denial of consent.
Duty of Loyalty & Care: The “duty of loyalty” would require a controller to obtain consent to process data in ways that are reasonably foreseeable to be against a consumer’s physical, financial, psychological, or reputational interests. The proposal also contains a “duty of care,” requiring controllers to conduct and document annual, nonpublic risk assessments for all processing of personal data. Of note, the “duty of loyalty” is distinct from the “fiduciary duty” contained in an earlier 2019 version of the NYPA (S. 5642).
Automated Decision Making: Annual public impact assessments would be required for a controller or processor “engaged in” consequential automated decisions (such as financial, housing, and employment). Whenever a controller makes an automated decision involving solely automated processing that “results in” a denial of financial or lending services, housing, public accommodation, insurance, health care services, or access to basic necessities, such as food and water, the controller must: (i) disclose that the decision was made by a solely automated process; (ii) provide an avenue for the affected consumer to appeal the decision, including by allowing the affected consumer to (a) express their point of view, (b) contest the decision, and (c) obtain meaningful human review; and (iii) explain how to appeal the decision.
Data Broker Registry: The NYPA would establish a data broker registry and prohibit controllers from sharing personal data with data brokers that fail to register.
New Jersey Disclosure and Accountability Transparency Act (NJ DaTA)
The New Jersey Disclosure and Accountability Transparency Act (NJ DaTA – A. 3283) introduced by Assemblyman Andrew Zwicker (D) was heard before the Assembly Science, Innovation and Technology Committee on March 15, 2021. The legislature will remain in session through 2021. The framework includes six lawful bases for data processing, affirmative data processing duties, the right for an individual to object to processing, and heightened requirements surrounding automated decision-making.
Six Lawful Grounds for Processing: The framework creates six lawful grounds for processing, including “legitimate interests” pursued, affirmative consent, contractual necessity, to protect the vital interests of a person, compliance with a legal obligation, and necessary for the performance of a task.
Data Processing Duties: NJ DaTA would require for: (1) all data collection to be for a specified, explicit, and legitimate purpose, (2) data to be processed lawfully, fairly, and transparently, (3) data collection and processing to be adequate, relevant, and limited to what is necessary, (4) data to be accurate and kept up to date, (5) data to be kept in a form which permits identification of consumers for no longer than is necessary for the processing purposes, (6) data security. Data protection impact assessments would also be required prior to processing personal data.
Right to Object to Processing: In addition to the rights of access, correction, deletion, and portability, the proposal would grant individuals the right to object to the processing of personal data. For a controller to continue to process personal data in this circumstance, the controller must demonstrate “compelling legitimate grounds” for processing which override the interests, rights, and freedoms of the consumer. The proposal would also grant individuals a right to object to processing of personal data for the purpose of direct marketing, including profiling. When a consumer objects to processing for this purpose, the controller must stop processing.
Automated Decision-Making: The proposal would create the right for a consumer not to be subject to a decision based on solely automated decision making, including profiling, which produces legal effects concerning the consumer. NJ DaTA would require controllers to provide specific notice to consumers at the time of collection of personal data regarding the existence of automated decision making, including profiling, meaningful information concerning the logic involved, and significance and potential consequences for the consumer.
New Data Protection Office: NJ DaTA would establish an Office of Data Protection and Responsible Uses in the Division of Consumer Affairs to oversee compliance with the Act.
Massachusetts Information Privacy Act (MIPA)
The Massachusetts Information Privacy Act (MIPA – S.46) was introduced by Sen. Cynthia Stone Creem’s (D) in March, 2021. The legislature will remain in session through 2021. MIPA’s framework is based on a framework of notice and consent, with additional trust-based obligations for covered entities. Heightened protections arise for biometric data, location data, and “surreptitious surveillance” is prohibited.
Two-Tiered Notice: MIPA would require two-tiered privacy notice, including a long-form and a short-form privacy policy to be made available at the point or prior to the point of sale of a product or service. MIPA would also authorize a government agency to develop a uniform logo or button to promote individual awareness of the short-form notice.
Opt-In Consent: Opt-in consent would be annually required to collect or process personal data.
Prohibition of Surreptitious Surveillance: MIPA would prohibit “surreptitious surveillance,” meaning that a covered entity would be required to obtain opt-in consent every 180 days in order to “activate the microphone, camera, or any other sensor” on individuals’ connected devices.
Heightened Protections for Biometric and Location Data: MIPA would require covered entities to obtain specific opt-in consent annually to collect and process location or biometric data, and additional consent to disclose it to third parties. Covered entities would also be required to establish a retention schedule and guidelines for permanently destroying biometric or location data.
Data Processing Duties: MIPA contains strict purpose limitations, and duties of loyalty, care, and confidentiality. It would also require covered entities to process personal data and use automated decision systems “discreetly and honestly,” to be “protective” of personal data, “loyal” to individuals, and “honest” about processing risks. The duty of loyalty would require covered entities to not use personal data or information derived from personal data in ways that: (1) benefit themselves to the detriment of an individual, (2) result in reasonably foreseeable and material physical or financial harm to an individual, or (3) would be unexpected and highly offensive to a reasonable individual.
Discrimination Prohibition: MIPA would prohibit a covered entity from engaging in acts or practices that directly result in discrimination against or otherwise make an opportunity, or a public accommodation, unavailable on the basis of an individual’s or group’s actual or perceived belonging to a protected class. This includes a prohibition on targeting advertisements on the basis of actual or perceived belonging to a protected class.
Texas HB 3741
HB 3741, introduced by Rep. Capriglione (R), was referred to the Business & Industry Committee on Mar. 22. Texas’s legislative session is scheduled to end May 31, 2021. The proposal has numerous unique features. It would enable a consumer to provide their “data stream” as consideration under contract, it imposes different restrictions on three defined subcategories of personal data, and it would require opt-in consent for geotracking. In addition, businesses would be required to maintain accurate personal data.
Data Streams: The proposal would enable an individual to provide their “data stream” as consideration under contract. “Data stream” is defined as “the continuous transmission of an individual ’s personal identifying information through online activity or with a device connected to the Internet that can be used by the business to provide for the monetization of the information, customer relationship management, or continuous identification of an individual for commercial purposes.”
Three Categories of Personal Data: HB 3741 divides personal data into three subcategories of “personally identifiable information.”
Category 1 information includes personal data “that an individual may use in a personal, civic, or business setting,” including a SSN, a driver’s license number, passport number, unique biometric information, physical or mental health information, private communications, etc.
Category 2 information includes personal data that may present a “privacy risk” to an individual, including members of a constitutionally protected class.” It includes information such as racial or ethnic origin, religion, age, precise geolocation, or physical or mental impairment. “Privacy risk” is defined broadly. Businesses would be prohibited from selling, transferring, or communicating category 2 information to a third party.
Category 3 information includes time of birth and political party or association. Businesses would be prohibited from collecting or processing category 3 information.
Opt-In Consent for Geolocation Tracking: Consent would be required to perform geolocation tracking of an individual, and to sell geolocation information. Individuals would also have the rights of access, correction, deletion, and portability.
Duty to Maintain Accurate Information: Businesses would be required to maintain accurate information, and to protect and properly secure personal data.
South Korea: The First Case Where the Personal Information Protection Act was Applied to an AI System
As AI regulation is being considered in the European Union, privacy commissioners and data protection authorities around the world are starting to apply existing comprehensive data protection laws against AI systems and how they process personal information. On April 28th, the South Korean Personal Information Protection Commission (PIPC) imposed sanctions and a fine of KRW 103.3 million (USD 92,900) on ScatterLab, Inc., developer of the chatbot “Iruda,” for eight violations of the Personal Information Protection Act (PIPA). This is the first time PIPC sanctioned an AI technology company for indiscriminate personal information processing.
“Iruda” caused considerablecontroversy in South Korea in early January after complaints of the chatbot using vulgar and discriminatory racist, homophobic, and ableist language in conversations with users. The chatbot, which assumed the persona of a 20-year-old college student named “Iruda” (Lee Luda), attracted more than 750,000 users on Facebook Messenger less than a month after release. The media reports prompted PIPC to launch an official investigation on January 12th, soliciting input from industry, law, academia, and civil society groups on personal information processing and legal and technical perspectives on AI development and services.
PIPC’s investigation found that ScatterLab used KakaoTalk, a popular South Korean messaging app, messages collected by its apps “Text At” and “Science of Love” between February 2020 to January 2021 to develop and operate its AI chatbot “Iruda.” Around 9.4 billion KakaoTalk messages from 600,000 users were employed in training algorithms to develop the “Iruda” AI model, without any efforts by ScatterLab to delete or encrypt users’ personal information, including their names, mobile phone numbers, and addresses. Additionally, 100 million KakaoTalk messages from women in their twenties were added to the response database with “Iruda” programmed to select and respond with one of these messages.
With regards to ScatterLab employing users’ KakaoTalk messages to develop and operate “Iruda,” PIPC found that including a “New Service Development” clause in the terms to log into the apps “Text At” and “Science of Love” did not amount to user’s “explicit consent.” The description of “New Service Development” was determined to be insufficient for users to anticipate that their KakaoTalk messages would be used to develop and operate “Iruda.” Therefore, PIPC determined that ScatterLab processed the user’s personal information beyond the purpose of collection.
In addition, ScatterLab posted its AI models on the code sharing and collaboration platform Github from October 2019 to January 2021, which included 1,431 KakaoTalk messages revealing 22 names (excluding last names), 34 locations (excluding districts and neighborhoods), gender, and relationships (friends or romantic partners) of users. This was found to be in violation of PIPA Article 28-2(2) which states, “A personal information controller shall not include information that may be used to identify a certain individual when providing pseudonymized information to a third party.”
ScatterLab also faced accusations of collecting personal information of over 200,000 children under the age of 14 without parental consent in the development and operation of its app services, “Text At,” “Science of Love,” and “Iruda,” as its services did not require age verification prior to subscribing.
PIPC Chairman Jong-in Yoon highlighted the complexity of the case at hand and the reasons why extensive public consultation took place as part of the proceedings: “Even the experts did not agree so there was more intense debate than ever before and the ‘Iruda’ case was decided after very careful review.” He explained, “This case is meaningful in that it has made clear that companies are prohibited from indiscriminately using personal information collected for specific services without clearly informing and obtaining explicit consent from data subjects.” Chairman Yoon added, “We hope that the results of this case will guide AI technology companies in setting the right direction for the processing of personal information and provide an opportunity for companies to strengthen their self management and supervision.”
PIPC plans to be active in supporting compliant AI Systems
PIPC also stated that it seeks to help AI technology companies in improving their privacy capabilities by having AI developers and operators present a “Self-Checklist for Personal Information Protection of AI Services” on-site, as well as support on-site consulting. PIPC plans to actively support AI technology companies to develop AI and data-based industries while protecting people’s personal information.
ScatterLab responded to the decision, “We feel a heavy sense of social responsibility as an AI tech company regarding the necessity to engage in proper personal information processing in the course of developing related technologies and services,” and stated that, “Upon the PIPC’s decision, we will not only actively implement the corrective actions put forth by the PIPC but also work to comply with the law and industry guidelines related to personal information processing.”