Future of Privacy Forum Launches the FPF Center for Artificial Intelligence

The FPF Center for Artificial Intelligence will serve as a catalyst for AI policy and compliance leadership globally, advancing responsible data and AI practices for public and private stakeholders

Today, the Future of Privacy Forum (FPF) launched the FPF Center for Artificial Intelligence, established to better serve policymakers, companies, non-profit organizations, civil society, and academics as they navigate the challenges of AI policy and governance. The Center will expand FPF’s long-standing AI work, introduce large-scale novel research projects, and serve as a source for trusted, nuanced, nonpartisan, and practical expertise. 

FPF’s Center work will be international as AI continues to deploy globally and rapidly. Cities, states, countries, and international bodies are already grappling with implementing laws and policies to manage the risks.“Data, privacy, and AI are intrinsically interconnected issues that we have been working on at FPF for more than 15 years, and we remain dedicated to collaborating across the public and private sectors to promote their ethical, responsible, and human-centered use,” said Jules Polonetsky, FPF’s Chief Executive Officer. “But we have reached a tipping point in the development of the technology that will affect future generations for decades to come. At FPF, the word Forum is a core part of our identity. We are a trusted convener positioned to build bridges between stakeholders globally, and we will continue to do so under the new Center for AI, which will sit within FPF.”

The Center will help the organization’s 220+ members navigate AI through the development of best practices, research, legislative tracking, thought leadership, and public-facing resources. It will be a trusted evidence-based source of information for policymakers, and it will collaborate with academia and civil society to amplify relevant research and resources. 

“Although AI is not new, we have reached an unprecedented moment in the development of the technology that marks a true inflection point. The complexity, speed and scale of data processing that we are seeing in AI systems can be used to improve people’s lives and spur a potential leapfrogging of societal development, but with that increased capability comes associated risks to individuals and to institutions,” said Anne J. Flanagan, Vice President for Artificial Intelligence at FPF. “The FPF Center for AI will act as a collaborative force for shared knowledge between stakeholders to support the responsible development of AI, including its fair, safe, and equitable use.”

The Center will officially launch at FPF’s inaugural summit DC Privacy Forum: AI Forward. The in-person and public-facing summit will feature high-profile representatives from the public and private sectors in the world of privacy, data and AI. 

FPF’s new Center for Artificial Intelligence will be supported by a Leadership Council of leading experts from around the globe. The Council will consist of members from industry, academia, civil society, and current and former policymakers. 

See the full list of founding FPF Center for AI Leadership Council members here.

I am excited about the launch of the Future of Privacy Forum’s new Center for Artificial Intelligence and honored to be part of its leadership council. This announcement builds on many years of partnership and collaboration between Workday and FPF to develop privacy best practices and advance responsible AI, which has already generated meaningful outcomes, including last year’s launch of best practices to foster trust in this technology in the workplace.  I look forward to working alongside fellow members of the Council to support the Center’s mission to build trust in AI and am hopeful that together we can map a path forward to fully harness the power of this technology to unlock human potential.

Barbara Cosgrove, Vice President, Chief Privacy Officer, Workday

I’m honored to be a founding member of the Leadership Council of the Future of Privacy Forum’s new Center for Artificial Intelligence. AI’s impact transcends borders, and I’m excited to collaborate with a diverse group of experts around the world to inform companies, civil society, policymakers, and academics as they navigate the challenges and opportunities of AI governance, policy, and existing data protection regulations.

Dr. Gianclaudio Malgieri, Associate Professor of Law & Technology at eLaw, University of Leiden

“As we enter this era of AI, we must require the right balance between allowing innovation to flourish and keeping enterprises accountable for the technologies they create and put on the market. IBM believes it will be crucial that organizations such as the Future of Privacy Forum help advance responsible data and AI policies, and we are proud to join others in industry and academia as part of the Leadership Council.”

Learn more about the FPF Center for AI here.

About Future of Privacy Forum (FPF)

The Future of Privacy Forum (FPF) is a global non-profit organization that brings together academics, civil society, government officials, and industry to evaluate the societal, policy, and legal implications of data use, identify the risks, and develop appropriate protections. 

FPF believes technology and data can benefit society and improve lives if the right laws, policies, and rules are in place. FPF has offices in Washington D.C., Brussels, Singapore, and Tel Aviv. Learn more at fpf.org.

FPF Develops Checklist & Guide to Help Schools Vet AI Tools for Legal Compliance

FPF’s Youth and Education team has developed a checklist and accompanying policy brief to help schools vet generative AI tools for compliance with student privacy laws. Vetting Generative AI Tools for Use in Schools is a crucial resource as the use of generative AI tools continues to increase in educational settings. It’s critical for school leaders to understand how existing federal and state student privacy laws, such as the Family Educational Rights and Privacy Act (FERPA) apply to the complexities of machine learning systems to protect student privacy. With these resources, FPF aims to provide much-needed clarity and guidance to educational institutions grappling with these issues.

Click here to access the checklist and policy brief.

“AI technology holds immense promise in enhancing educational experiences for students, but it must be implemented responsibly and ethically,” said David Sallay, the Director for Youth & Education Privacy at the Future of Privacy Forum. “With our new checklist, we aim to empower educators and administrators with the knowledge and tools necessary to make informed decisions when selecting generative AI tools for classroom use while safeguarding student privacy.”

The checklist, designed specifically for K -12 schools, outlines key considerations when incorporating generative AI into a school or district’s edtech vetting checklist. 

These include: 

By prioritizing these steps, educational institutions can promote transparency and protect student privacy while maximizing the benefits of technology-driven learning experiences for students. 

The in-depth policy brief outlines the relevant laws and policies a school should consider, the unique compliance considerations of generative AI tools (including data collection, transparency and explainability, product improvement, and high-risk decision-making), and their most likely use cases (student, teacher, and institution-focused).

The brief also encourages schools and districts to update their existing edtech vetting policies to address the unique considerations of AI technologies (or to create a comprehensive policy if one does not already exist) instead of creating a separate vetting process for AI. It also highlights the role that state legislatures can play in ensuring the efficiency of school edtech vetting and oversight and calls on vendors to be proactively transparent with schools about their use of AI.

li live promo

Check out the LinkedIn Live with CEO Jules Polonetsky and Youth & Education Director David Sallay about the Checklist and Policy Brief.

To read more of the Future of Privacy Forum’s youth and student privacy resources, visit www.StudentPrivacyCompass.org

FPF Releases “The Playbook: Data Sharing for Research” Report and Infographic

Today, the Future of Privacy Forum (FPF) published “The Playbook: Data Sharing for Research,” a report on best practices for instituting research data-sharing programs between corporations and research institutions. FPF also developed a summary of recommendations from the full report.

Facilitating data sharing for research purposes between corporate data holders and academia can unlock new scientific insights and drive progress in public health, education, social science, and a myriad of other fields for the betterment of the broader society. Academic researchers use this data to consider consumer, commercial, and scientific questions at a scale they cannot reach using conventional research data-gathering techniques alone. This data also helped researchers answer questions on topics ranging from bias in targeted advertising and the influence of misinformation on election outcomes to early diagnosis of diseases through data collected by fitness and health apps.

The playbook addresses vital steps for data management, sharing, and program execution between companies and researchers. Creating a data-sharing ecosystem that positively advances scientific research requires a better understanding of the established risks, opportunities to address challenges, and the diverse stakeholders involved in data-sharing decisions. This report aims to encourage safe, responsible data-sharing between industries and researchers.

“Corporate data sharing connects companies with research institutions, by extension increasing the quantity and quality of research for social good,” said Shea Swauger, Senior Researcher for Data Sharing and Ethics. “This Playbook showcases the importance, and advantages, of having appropriate protocols in place to create safe and simple data sharing processes.”

In addition to the Playbook, FPF created a companion infographic summarizing the benefits, challenges, and opportunities of data sharing for research outlined in the larger report.

research data sharing infographic

As a longtime advocate for facilitating the privacy-protective sharing of data by industry to the research community, FPF is proud to have created this set of best practices for researchers, institutions, policymakers, and data-holding companies. In addition to the Playbook, the Future of Privacy Forum has also opened nominations for its annual Award for Research Data Stewardship.

“Our goal with these initiatives is to celebrate the successful research partnerships transforming how corporations and researchers interact with each other,” Swauger said. “Hopefully, we can continue to engage more audiences and encourage others to model their own programs with solid privacy safeguards.”

Shea Swauger, Senior Researcher for Data Sharing and Ethics, Future of privacy Forum

Established by FPF in 2020 with support from The Alfred P. Sloan Foundation, the Award for Research Data Stewardship recognizes excellence in the privacy-protective stewardship of corporate data shared with academic researchers. The call for nominations is open and closes on Tuesday, January 17, 2023. To submit a nomination, visit the FPF site.

FPF has also launched a newly formed Ethics and Data in Research Working Group; this group receives late-breaking analyses of emerging US legislation affecting research and data, meets to discuss the ethical and technological challenges of conducting research, and collaborates to create best practices to protect privacy, decrease risk, and increase data sharing for research, partnerships, and infrastructure. Learn more and join here

FPF Testifies Before House Subcommittee on Energy and Commerce, Supporting Congress’s Efforts on the “American Data Privacy and Protection Act” 

This week, FPF’s Senior Policy Counsel Bertram Lee testified before the U.S. House Energy and Commerce Subcommittee on Consumer Protection and Commerce hearing, “Protecting America’s Consumers: Bipartisan Legislation to Strengthen Data Privacy and Security” regarding the bipartisan, bicameral privacy discussion draft bill, “American Data Privacy and Protection Act” (ADPPA). FPF has a history of supporting the passage of a comprehensive federal consumer privacy law, which would provide businesses and consumers alike with the benefit of clear national standards and protections.

Lee’s testimony opened by applauding the Committee on its efforts towards comprehensive federal privacy legislation and emphasized the “time is now” for its passage. As it is written, the ADPPA would address gaps in the sectoral approach to consumer privacy, establish strong national civil rights protections, and establish new rights and safeguards for the protection of sensitive personal information. 

“The ADPPA is more comprehensive in scope, inclusive of civil rights protections, and provides individuals with more varied enforcement mechanisms in comparison to some states’ current privacy regimes,” Lee said in his testimony. “It also includes corporate accountability mechanisms, such as the requiring privacy designations, data security offices, and executive certifications showing compliance, which is missing from current states’ laws. Notably, the ADPPA also requires ‘short-form’ privacy notices to aid consumers of how their data will be used by companies and their rights — a provision that is not found in any state law.” 

Lee’s testimony also provided four recommendations to strengthen the bill, which include: 

Many of the recommendations would ensure that the legislation gives individuals meaningful privacy rights and places clear obligations on businesses and other organizations that collect, use and share personal data. The legislation would expand civil rights protections for individuals and communities harmed by algorithmic discrimination as well as require algorithmic assessments and evaluations to better understand how these technologies can impact communities. 

The submitted testimony and a video of the hearing can be found on the House Committee on Energy & Commerce site.

Reading the Signs: the Political Agreement on the New Transatlantic Data Privacy Framework

The President of the United States, Joe Biden, and the President of the European Commission, Ursula von der Leyen, announced last Friday, in Brussels, a political agreement on a new Transatlantic framework to replace the Privacy Shield. 

This is a significant escalation of the topic within Transatlantic affairs, compared to the 2016 announcement of a new deal to replace the Safe Harbor framework. Back then, it was Commission Vice-President Andrus Ansip and Commissioner Vera Jourova who announced at the beginning of February 2016 that a deal had been reached. 

The draft adequacy decision was only published a month after the announcement, and the adequacy decision was adopted 6 months later, in July 2016. Therefore, it should not be at all surprising if another 6 months (or more!) pass before the adequacy decision for the new Framework will produce legal effects and actually be able to support transfers from the EU to the US. Especially since the US side still has to pass at least one Executive Order to provide for the agreed-upon new safeguards.

This means that transfers of personal data from the EU to the US may still be blocked in the following months – possibly without a lawful alternative to continue them – as a consequence of Data Protection Authorities (DPAs) enforcing Chapter V of the General Data Protection Regulation in the light of the Schrems II judgment of the Court of Justice of the EU, either as part of the 101 noyb complaints submitted in August 2020 and slowly starting to be solved, or as part of other individual complaints/court cases. 

After the agreement “in principle” was announced at the highest possible political level, EU Justice Commissioner Didier Reynders doubled down on the point that this agreement is reached “on the principles” for a new framework, rather than on the details of it. Later on he also gave credit to Commerce Secretary Gina Raimondo and US Attorney General Merrick Garland for their hands-on involvement in working towards this agreement. 

In fact, “in principle” became the leitmotif of the announcement, as the first EU Data Protection Authority to react to the announcement was the European Data Protection Supervisor, who wrote that he “Welcomes, in principle”, the announcement of a new EU-US transfers deal – “The details of the new agreement remain to be seen. However, EDPS stresses that a new framework for transatlantic data flows must be sustainable in light of requirements identified by the Court of Justice of the EU”.

Of note, there is no catchy name for the new transfers agreement, which was referred to as the “Trans-Atlantic Data Privacy Framework”. Nonetheless, FPF’s CEO Jules Polonetsky submits the “TA DA!” Agreement, and he has my vote. For his full statement on the political agreement being reached, see our release here.

Some details of the “principles” agreed on were published hours after the announcement, both by the White House and by the European Commission. Below are a couple of things that caught my attention from the two brief Factsheets.

The US has committed to “implement new safeguards” to ensure that SIGINT activities are “necessary and proportionate” (an EU law legal measure – see Article 52 of the EU Charter on how the exercise of fundamental rights can be limited) in the pursuit of defined national security objectives. Therefore, the new agreement is expected to address the lack of safeguards for government access to personal data as specifically outlined by the CJEU in the Schrems II judgment.

The US also committed to creating a “new mechanism for the EU individuals to seek redress if they believe they are unlawfully targeted by signals intelligence activities”. This new mechanism was characterized by the White House as having “independent and binding authority”. Per the White House, this redress mechanism includes “a new multi-layer redress mechanism that includes an independent Data Protection Review Court that would consist of individuals chosen from outside the US Government who would have full authority to adjudicate claims and direct remedial measures as needed”. The EU Commission mentioned in its own Factsheet that this would be a “two-tier redress system”. 

Importantly, the White House mentioned in the Factsheet that oversight of intelligence activities will also be boosted – “intelligence agencies will adopt procedures to ensure effective oversight of new privacy and civil liberties standards”. Oversight and redress are different issues and are both equally important – for details, see this piece by Christopher Docksey. However, they tend to be thought of as being one and the same. Being addressed separately in this announcement is significant.

One of the remarkable things about the White House announcement is that it includes several EU law-specific concepts: “necessary and proportionate”, “privacy, data protection” mentioned separately, “legal basis” for data flows. In another nod to the European approach to data protection, the entire issue of ensuring safeguards for data flows is framed as more than a trade or commerce issue – with references to a “shared commitment to privacy, data protection, the rule of law, and our collective security as well as our mutual recognition of the importance of trans-Atlantic data flows to our respective citizens, economies, and societies”.

Last, but not least, Europeans have always framed their concerns related to surveillance and data protection as being fundamental rights concerns. The US also gives a nod to this approach, by referring a couple of times to “privacy and civil liberties” safeguards (adding thus the “civil liberties” dimension) that will be “strengthened”. All of these are positive signs for a “rapprochement” of the two legal systems and are certainly an improvement to the “commerce” focused approach of the past on the US side. 

Lastly, it should also be noted that the new framework will continue to be a self-certification scheme managed by the US Department of Commerce.  

What does all of this mean in practice? As the White House details, this means that the Biden Administration will have to adopt (at least) an Executive Order (EO) that includes all these commitments and on the basis of which the European Commission will draft an adequacy decision.

Thus, there are great expectations in sight following the White House and European Commission Factsheets, and the entire privacy and data protection community is waiting to see further details.

In the meantime, I’ll leave you with an observation made by my colleague, Amie Stepanovich, VP for US Policy at FPF, who highlighted that Section 702 of the FISA Act is set to expire on December 31, 2023. This presents Congress with an opportunity to act, building on such an extensive amount of work done by the US Government in the context of the Transatlantic Data Transfers debate.

Privacy Best Practices for Rideshare Drivers Using Dashcams

FPF & Uber Publish Guide Highlighting Privacy Best Practices for Drivers who Record Video and Audio on Rideshare Journeys

FPF and Uber have created a guide for US-based rideshare drivers who install “dashcams” – video cameras mounted on a vehicle’s dashboard or windshield. Many drivers install dashcams to improve safety, security, and accountability; the cameras can capture crashes or other safety-related incidents outside and inside cars. Dashcam footage can be helpful to drivers, passengers, insurance companies, and others when adjudicating legal claims. At the same time, dashcams can pose substantial privacy risks if appropriate safeguards are not in place to limit the collection, use, and disclosure of personal data. 

Dashcams typically record video outside a vehicle. Many dashcams also record in-vehicle audio and some record in-vehicle video. Regardless of the particular device used, ride-hail drivers who use dashcams must comply with applicable audio and video recording laws.

The guide explains relevant laws and provides practical tips to help drivers be transparent, limit data use and sharing, retain video and audio-only for practical purposes, and use strict security controls. The guide highlights ways that drivers can employ physical signs, in-app notices, and other means to ensure passengers are informed about dashcam use and can make meaningful choices about whether to travel in a dashcam-equipped vehicle. Drivers seeking advice concerning specific legal obligations or incidents should consult legal counsel.

Privacy best practices for dashcams include: 

  1. Give individuals notice that they are being recorded
    • Place recording notices inside and on the vehicle.
    • Mount the dashcam in a visible location.
    • Consider, in some situations, giving an oral notification that recording is taking place.
    • Determine whether the ride sharing service provides recording notifications in the app, and utilize those in-app notices.
  2. Only record audio and video for defined, reasonable purposes
    • Only keep recordings for as long as needed for the original purpose.
    • Inform passengers as to why video and/or audio is being recorded.
  3. Limit sharing and use of recorded footage
    • Only share video and audio with third parties for relevant reasons that align with the original reason for recording.
    • Thoroughly review the rideshare service’s privacy policy and community guidelines if using an app-based rideshare service, and be aware that many rideshare companies maintain policies against widely disseminating recordings.
  4. Safeguard and encrypt recordings and delete unused footage
    • Identify dashcam vendors that provide the highest privacy and security safeguards.
    • Carefully read the terms and conditions when buying dashcams to understand the data flows.

Uber will be making these best practices available to drivers in their app and website. 

Many ride-hail drivers use dashcams in their cars, and the guidance and best practices published today provide practical guidance to help drivers implement privacy protections. But driver guidance is only one aspect of ensuring individuals’ privacy and security when traveling. Dashcam manufacturers must implement privacy-protective practices by default and provide easy-to-use privacy options. At the same time, ride-hail platforms must provide drivers with the appropriate tools to notify riders, and carmakers must safeguard drivers’ and passengers’ data collected by OEM devices.

In addition, dashcams are only one example of increasingly sophisticated sensors appearing in passenger vehicles as part of driver monitoring systems and related technologies. Further work is needed to apply comprehensive privacy safeguards to emerging technologies across the connected vehicle sector, from carmakers and rideshare services to mobility services providers and platforms. Comprehensive federal privacy legislation would be a good start. And in the absence of Congressional action, FPF is doing further work to identify key privacy risks and mitigation strategies for the broader class of driver monitoring systems that raise questions about technologies beyond the scope of this dashcam guide.

12th Annual Privacy Papers for Policymakers Awardees Explore the Nature of Privacy Rights & Harms

The winners of the 12th annual Future of Privacy (FPF) Privacy Papers for Policymakers Award ask big questions about what should be the foundational elements of data privacy and protection and who will make key decisions about the application of privacy rights. Their scholarship will inform policy discussions around the world about privacy harms, corporate responsibilities, oversight of algorithms, and biometric data, among other topics.

“Policymakers and regulators in many countries are working to advance data protection laws, often seeking in particular to combat discrimination and unfairness,” said FPF CEO Jules Polonetsky. “FPF is proud to highlight independent researchers tackling big questions about how individuals and society relate to technology and data.”

This year’s papers also explore smartphone platforms as privacy regulators, the concept of data loyalty, and global privacy regulation. The award recognizes leading privacy scholarship that is relevant to policymakers in the U.S. Congress, at U.S. federal agencies, and among international data protection authorities. The winning papers will be presented at a virtual event on February 10, 2022. 

The winners of the 2022 Privacy Papers for Policymakers Award are:

From the record number of nominated papers submitted this year, these six papers were selected by a diverse team of academics, advocates, and industry privacy professionals from FPF’s Advisory Board. The winning papers were selected based on the research and solutions that are relevant for policymakers and regulators in the U.S. and abroad.

In addition to the winning papers, FPF has selected two papers for Honorable Mention: Verification Dilemmas and the Promise of Zero-Knowledge Proofs by Kenneth Bamberger, University of California, Berkeley – School of Law; Ran Canetti, Boston University, Department of Computer Science, Boston University, Faculty of Computing and Data Science, Boston University, Center for Reliable Information Systems and Cybersecurity; Shafi Goldwasser, University of California, Berkeley – Simons Institute for the Theory of Computing; Rebecca Wexler, University of California, Berkeley – School of Law; and Evan Zimmerman, University of California, Berkeley – School of Law; and A Taxonomy of Police Technology’s Racial Inequity Problems by Laura Moy, Georgetown University Law Center.

FPF also selected a paper for the Student Paper Award, A Fait Accompli? An Empirical Study into the Absence of Consent to Third Party Tracking in Android Apps by Konrad Kollnig and Reuben Binns, University of Oxford; Pierre Dewitte, KU Leuven; Max van Kleek, Ge Wang, Daniel Omeiza, Helena Webb, and Nigel Shadbolt, University of Oxford. The Student Paper Award Honorable Mention was awarded to Yeji Kim, University of California, Berkeley – School of Law, for her paper, Virtual Reality Data and Its Privacy Regulatory Challenges: A Call to Move Beyond Text-Based Informed Consent.

The winning authors will join FPF staff to present their work at a virtual event with policymakers from around the world, academics, and industry privacy professionals. The event will be held on February 10, 2022, from 1:00 – 3:00 PM EST. The event is free and open to the general public. To register for the event, visit https://bit.ly/3qmJdL2.

Organizations must lead with privacy and ethics when researching and implementing neurotechnology: FPF and IBM Live event and report release

The Future of Privacy Forum (FPF) and the IBM Policy Lab released recommendations for promoting privacy and mitigating risks associated with neurotechnology, specifically with brain-computer interface (BCI). The new report provides developers and policymakers with actionable ways this technology can be implemented while protecting the privacy and rights of its users.

“We have a prime opportunity now to implement strong privacy and human rights protections as brain-computer interfaces become more widely used,” said Jeremy Greenberg, Policy Counsel at the Future of Privacy Forum. “Among other uses, these technologies have tremendous potential to treat people with diseases and conditions like epilepsy or paralysis and make it easier for people with disabilities to communicate, but these benefits can only be fully realized if meaningful privacy and ethical safeguards are in place.”

Brain-computer interfaces are computer-based systems that are capable of directly recording, processing, analyzing, or modulating human brain activity. The sensitivity of data that BCIs collect and the capabilities of the technology raise concerns over consent, as well as the transparency, security, and accuracy of the data. The report offers a number of policy and technical solutions to mitigate the risks of BCIs and highlights their positive uses.

“Emerging innovations like neurotechnology hold great promise to transform healthcare, education, transportation, and more, but they need the right guardrails in place to protect individuals’ privacy,” said IBM Chief Privacy Officer Christina Montgomery. “Working together with the Future of Privacy Forum, the IBM Policy Lab is pleased to release a new framework to help policymakers and businesses navigate the future of neurotechnology while safeguarding human rights.”

FPF and IBM have outlined several key policy recommendations to mitigate the privacy risks associated with BCIs, including:

FPF and IBM have also included several technical recommendations for BCI devices, including:

FPF-curated educational resources, policy & regulatory documents, academic papers, thought pieces, and technical analyses regarding brain-computer interfaces are available here.

Read FPF’s four-part series on Brain-Computer Interfaces (BCIs), providing an overview of the technology, use cases, privacy risks, and proposed recommendations for promoting privacy and mitigating risks associated with BCIs.

FPF Launches Asia-Pacific Region Office, Global Data Protection Expert Clarisse Girot Leads Team

The Future of Privacy Forum (FPF) has appointed Clarisse Girot, PhD, LLM, an expert on Asian and European privacy legislation, to lead its new FPF Asia-Pacific office based in Singapore as Director. This new office expands FPF’s international reach in Asia and complements FPF’s offices in the U.S., Europe, and Israel, as well as partnerships around the globe.
 
Dr. Clarisse Girot is a privacy professional with over twenty years of experience in the privacy and data protection fields. Since 2017, Clarisse has been leading the Asian Business Law Institute’s (ABLI) Data Privacy Project, focusing on the regulations on cross-border data transfers in 14 Asian jurisdictions. Prior to her time at ABLI, Clarisse served as the Counsellor to the President of the French Data Protection Authority (CNIL) and Chair of the Article 29 Working Party. She previously served as head of CNIL’s Department of European and International Affairs, where she sat on the Article 29 Working Party, the group of EU Data Protection Authorities, and was involved in major international cases in data protection and privacy.
 
“Clarisse is joining FPF at an important time for data protection in the Asia-Pacific region. The two most populous countries in the world, India, and China, are introducing general privacy laws, and established data protection jurisdictions, like Singapore, Japan, South Korea, and New Zealand, have recently updated their laws,” said FPF CEO Jules Polonetsky. “Her extensive knowledge of privacy law will provide vital insights for those interested in compliance with regional privacy frameworks and their evolution over time.”
 
FPF Asia-Pacific will focus on several priorities by the end of the year including hosting an event at this year’s Singapore Data Protection Week. The office will provide expertise in digital data flows and discuss emerging data protection issues in a way that is useful for regulators, policymakers, and legal professionals. Rajah & Tann Singapore LLP is supporting the work of the FPF Asia-Pacific office.
 
“The FPF global team will greatly benefit from the addition of Clarisse. She will advise FPF staff, advisory board members, and the public on the most significant privacy developments in the Asia-Pacific region, including data protection bills and cross-border data flows,” said Gabriela Zanfir-Fortuna, Director for Global Privacy at FPF. “Her past experience in both Asia and Europe gives her a unique ability to confront the most complex issues dealing with cross-border data protection.”
 
As over 140 countries have now enacted a privacy or data protection law, FPF continues to expand its international presence to help data protection experts grapple with the challenges of ensuring responsible uses of data. Following the appointment of Malavika Raghavan as Senior Fellow for India in 2020, the launch of the FPF Asia-Pacific office further expands FPF’s international reach.
 
Dr. Gabriela Zanfir-Fortuna leads FPF’s international efforts and works on global privacy developments and European data protection law and policy. The FPF Europe office is led by Dr. Rob van Eijk, who prior to joining FPF worked at the Dutch Data Protection Authority as Senior Supervision Officer and Technologist for nearly ten years. FPF has created thriving partnerships with leading privacy research organizations in the European Union, such as Dublin City University and the Brussels Privacy Hub of the Vrije Universiteit Brussel (VUB). FPF continues to serve as a leading voice in Europe on issues of international data flows, the ethics of AI, and emerging privacy issues. FPF Europe recently published a report comparing the regulatory strategy for 2021-2022 of 15 Data Protection Authorities to provide insights into the future of enforcement and regulatory action in the EU.
 
Outside of Europe, FPF has launched a variety of projects to advance tech policy leadership and scholarship in regions around the world, including Israel and Latin America. The work of the Israel Tech Policy Institute (ITPI), led by Managing Director Limor Shmerling Magazanik, includes publishing a report on AI Ethics in Government Services and organizing an OECD workshop with the Israeli Ministry of Health on access to health data for research.
 
In Latin America, FPF has partnered with the leading research association Data Privacy Brasil, provided in-depth analysis on Brazil’s LGPD privacy legislation and various data privacy cases decided in the Brazilian Supreme Court. FPF recently organized a panel during the CPDP LatAm Conference which explored the state of Latin American data protection laws alongside experts from Uber, the University of Brasilia, and the Interamerican Institute of Human Rights.
 

Read Dr. Girot’s Q&A on the FPF blog. Stay updated: Sign up for FPF Asia-Pacific email alerts.
 

FPF and Leading Health & Equity Organizations Issue Principles for Privacy & Equity in Digital Contact Tracing Technologies

With support from the Robert Wood Johnson Foundation, FPF engaged leaders within the privacy and equity communities to develop actionable guiding principles and a framework to help bolster the responsible implementation of digital contact tracing technologies (DCTT). Today, seven privacy, civil rights, and health equity organizations signed on to these guiding principles for organizations implementing DCTT.

“We learned early in our Privacy and Pandemics initiative that unresolved ethical, legal, social, and equity issues may challenge the responsible implementation of digital contact tracing technologies,” said Jules Polonetsky, CEO of the Future of Privacy Forum. “So we engaged leaders within the civil rights, health equity, and privacy communities to create a set of actionable principles to help guide organizations implementing digital contact tracing that respects individual rights.”

Contact tracing has long been used to monitor the spread of various infectious diseases. In light of COVID-19, governments and companies began deploying digital exposure notification using Bluetooth and geolocation data on mobile devices to boost contact tracing efforts and quickly identify individuals who may have been exposed to the virus. However, as DCTT begins to play an important role in public health, it is important to take necessary steps to ensure equity in access to DCTT and understand the societal risks and tradeoffs that might accompany its implementation today and in the future. Governance efforts that seek to better understand these risks will be better able to bolster public trust in DCTT technologies. 

“LGBT Tech is proud to have participated in the development of the Principles and Framework alongside FPF and other organizations. We are heartened to see that the focus of these principles is on historically underserved and under-resourced communities everywhere, like the LGBTQ+ community. We believe the Principles and Framework will help ensure that the needs and vulnerabilities of these populations are at the forefront during today’s pandemic and future pandemics.”

Carlos Gutierrez, Deputy Director, and General Counsel, LGBT Tech

“If we establish practices that protect individual privacy and equity, digital contact tracing technologies could play a pivotal role in tracking infectious diseases,” said Dr. Rachele Hendricks-Sturrup, Research Director at the Duke-Margolis Center for Health Policy. “These principles allow organizations implementing digital contact tracing to take ethical and responsible approaches to how their technology collects, tracks, and shares personal information.”

FPF, together with Dialogue on Diversity, the National Alliance Against Disparities in Patient Health (NADPH), BrightHive, and LGBT Tech, developed the principles, which advise organizations implementing DCTT to commit to the following actions:

  1. Be Transparent About How Data Is Used and Shared. 
  1. Apply Strong De-Identification Techniques and Solutions. 
  1. Empower Users Through Tiered Opt-in/Opt-out Features and Data Minimization. 
  1. Acknowledge and Address Privacy, Security, and Nondiscrimination Protection Gaps. 
  1. Create Equitable Access to DCTT. 
  1. Acknowledge and Address Implicit Bias Within and Across Public and Private Settings.
  1. Democratize Data for Public Good While Employing Appropriate Privacy Safeguards. 
  1. Adopt Privacy-By-Design Standards That Make DCTT Broadly Accessible. 

Additional supporters of these principles include the Center for Democracy and Technology and Human Rights First.

To learn more and sign on to the DCTT Principles visit fpf.org/DCTT.

Support for this program was provided by the Robert Wood Johnson Foundation. The views expressed here do not necessarily reflect the views of the Foundation.

Navigating Preemption through the Lens of Existing State Privacy Laws

This post is the second of two posts on federal preemption and enforcement in United States federal privacy legislation. See Preemption in US Privacy Laws (June 14, 2021).

In drafting a federal baseline privacy law in the United States, lawmakers must decide to what extent the law will override state and local privacy laws. In a previous post, we discussed a survey of 12 existing federal privacy laws passed between 1968-2003, and the extent to which they are preemptive of similar state laws. 

Another way to approach the same question, however, is to examine the hundreds of existing state privacy laws currently on the books in the United States. Conversations around federal preemption inevitably focus on comprehensive laws like the California Consumer Privacy Act, or the Virginia Consumer Data Protection Act — but there are hundreds of other state privacy laws on the books that regulate commercial and government uses of data. 

In reviewing existing state laws, we find that they can be categorized usefully into: laws that complement heavily regulated sectors (such as health and finance); laws of general applicability; common law; laws governing state government activities (such as schools and law enforcement); comprehensive laws; longstanding or narrowly applicable privacy laws; and emerging sectoral laws (such as biometrics or drones regulations). As a resource, we recommend: Robert Ellis Smith, Compilation of State and Federal Privacy Laws (last supplemented in 2018). 

  1. Heavily Regulated Sectoral Silos. Most federal proposals for a comprehensive privacy law would not supersede other existing federal laws that contain privacy requirements for businesses, such as the Health Insurance Portability and Accountability Act (HIPAA) or the Gramm-Leach-Bliley Act (GLBA). As a result, a new privacy law should probably not preempt state sectoral laws that: (1) supplement their federal counterparts and (2) were intentionally not preempted by those federal regimes. In many cases, robust compliance regimes have been built around federal and state parallel requirements, creating entrenched privacy expectations, privacy tools, and compliance practices for organizations (“lock in”).
  1. Laws of General Applicability. All 50 states have laws barring unfair and deceptive commercial and trade practices (UDAP), as well as generally applicable laws against fraud, unconscionable contracts, and other consumer protections. In cases where violations involve the mis-use of personal information, such claims could be inadvertently preempted by a national privacy law.
  1. State Common Law. Privacy claims have been evolving in US common law over the last hundred years, and claims vary from state to state. A federal privacy law might preempt (or not preempt) claims brought under theories of negligence, breach of contract, product liability, invasions of privacy, or other “privacy torts.”
  2. State Laws Governing State Government Activities. In general, states retain the right to regulate their own government entities, and a commercial baseline privacy law is unlikely to affect such state privacy laws. These include, for example, state “mini Privacy Acts” applying to state government agencies’ collection of records, state privacy laws applicable to public schools and school districts, and state regulations involving law enforcement — such as government facial recognition bans.
  1. Comprehensive or Non-Sectoral State Laws. Lawmakers considering the extent of federal preemption should take extra care to consider the effect on different aspects of omnibus or comprehensive consumer privacy laws, such as the California Consumer Privacy Act (CCPA), the Colorado Privacy Act, and the Virginia Consumer Data Protection Act. In addition, however, there are a number of other state privacy laws that can be considered “non-sectoral” because they apply broadly to businesses that collect or use personal information. These include, for example, CalOPPA (requiring commercial privacy policies), the California “Shine the Light” law (requiring disclosures from companies that share personal information for direct marketing), data breach notification laws, and data disposal laws.
  1. Longstanding, Narrowly Applicable State Privacy Laws. Many states have relatively long-standing privacy statutes on the books that govern narrow use cases, such as: state laws governing library records, social media password laws, mugshot laws, anti-paparazzi laws, state laws governing audio surveillance between private parties, and laws governing digital assets of decedents. In many cases, such laws could be expressly preserved or incorporated into a federal law. 
  1. Emerging Sectoral and Future-Looking Privacy Laws. New state laws have emerged in recent years in response to novel concerns, including for: biometric data; drones; connected and autonomous vehicles; the Internet of Things; data broker registration; and disclosure of intimate images. This trend is likely to continue, particularly in the absence of a federal law.

Congressional intent is the “ultimate touchstone” of preemption. Lawmakers should consider long-term effects on current and future state laws, including how they will be impacted by a preemption provision, as well as how they might be expressly preserved through a Savings Clause. In order to help build consensus, lawmakers should work with stakeholders and experts in the numerous categories of laws discussed above, to consider how they might be impacted by federal preemption.

ICYMI: Read the first blog in this series PREEMPTION IN US PRIVACY LAWS.

Manipulative Design: Defining Areas of Focus for Consumer Privacy

In consumer privacy, the phrase “dark patterns” is everywhere. Emerging from a wide range of technical and academic literature, it now appears in at least two US privacy laws: the California Privacy Rights Act and the Colorado Privacy Act (which, if signed by the Governor, will come into effect in 2025).

Under both laws, companies will be prohibited from using “dark patterns,” or “user interface[s] designed or manipulated with the substantial effect of subverting or impairing user autonomy, decision‐making, or choice,” to obtain user consent in certain situations–for example, for the collection of sensitive data.

When organizations give individuals choices, some forms of manipulation have long been barred by consumer protection laws, with the Federal Trade Commission and state Attorneys General prohibiting companies from deceiving or coercing consumers into taking actions they did not intend or striking bargains they did not want. But consumer protection law does not typically prohibit organizations from persuading consumers to make a particular choice. And it is often unclear where the lines fall between cajoling, persuading, pressuring, nagging, annoying, or bullying consumers. The California and Colorado laws seek to do more than merely bar deceptive practices; they prohibit design that “subverts or impairs user autonomy.”

What does it mean to subvert user autonomy, if a design does not already run afoul of traditional consumer protections law? Just as in the physical world, the design of digital platforms and services always influences behavior — what to pay attention to, what to read and in what order, how much time to spend, what to buy, and so on. To paraphrase Harry Brignull (credited with coining the term), not everything “annoying” can be a dark pattern. Some examples of dark patterns are both clear and harmful, such as a design that tricks users into making recurring payments, or a service that offers a “free trial” and then makes it difficult or impossible to cancel. In other cases, the presence of “nudging” may be clear, but harms may be less clear, such as in beta-testing what color shades are most effective at encouraging sales. Still others fall in a legal grey area: for example, is it ever appropriate for a company to repeatedly “nag” users to make a choice that benefits the company, with little or no accompanying benefit to the user?

In Fall 2021, Future of Privacy Forum will host a series of workshops with technical, academic, and legal experts to help define clear areas of focus for consumer privacy, and guidance for policymakers and legislators. These workshops will feature experts on manipulative design in at least three contexts of consumer privacy: (1) Youth & Education; (2) Online Advertising and US Law; and (3) GDPR and European Law. 

As lawmakers address this issue, we identify at least four distinct areas of concern:

This week at the first edition of the annual Dublin Privacy Symposium, FPF will join other experts to discuss principles for transparency and trust. The design of user interfaces for digital products and services pervades modern life and directly impacts the choices people make with respect to sharing their personal information. 

India’s new Intermediary & Digital Media Rules: Expanding the Boundaries of Executive Power in Digital Regulation

tree 200795 1920

Author: Malavika Raghavan

India’s new rules on intermediary liability and regulation of publishers of digital content have generated significant debate since their release in February 2021. The Information Technology (Intermediary Guidelines and Digital Media Ethics Code) Rules, 2021 (the Rules) have:

The majority of these provisions were unanticipated, resulting in a raft of petitions filed in High Courts across the country challenging the validity of the various aspects of the Rules, including with regard to their constitutionality. On 25 May 2021, the three month compliance period on some new requirements for significant social media intermediaries (so designated by the Rules) expired, without many intermediaries being in compliance opening them up to liability under the Information Technology Act as well as wider civil and criminal laws. This has reignited debates about the impact of the Rules on business continuity and liability, citizens’ access to online services, privacy and security. 

Following on FPF’s previous blog highlighting some aspects of these Rules, this article presents an overview of the Rules before deep-diving into critical issues regarding their interpretation and application in India. It concludes by taking stock of some of the emerging effects of these new regulations, which have major implications for millions of Indian users, as well as digital services providers serving the Indian market. 

1. Brief overview of the Rules: Two new regimes for ‘intermediaries’ and ‘publishers’ 

The new Rules create two regimes for two different categories of entities: ‘intermediaries’ and ‘publishers’.  Intermediaries have been the subject of prior regulations – the Information Technology (Intermediaries guidelines) Rules, 2011 (the 2011 Rules), now superseded by these Rules. However, the category of “publishers” and related regime created by these Rules did not previously exist. 

The Rules begin with commencement provisions and definitions in Part I. Part II of the Rules apply to intermediaries (as defined in the Information Technology Act 2000 (IT Act)) who transmit electronic records on behalf of others, and includes online intermediary platforms (like Youtube, Whatsapp, Facebook). The rules in this part primarily flesh out the protections offered in Section 79 of India’s Information Technology Act 2000 (IT Act), which give passive intermediaries the benefit of a ‘safe harbour’ from liability for objectionable information shared by third parties using their services — somewhat akin to protections under section 230 of the US Communications Decency Act.  To claim this protection from liability, intermediaries need to undertake certain ‘due diligence’ measures, including informing users of the types of content that could not be shared, and content take-down procedures (for which safeguards evolved overtime through important case law). The new Rules supersede the 2011 Rules and also significantly expand on them, introducing new provisions and additional due diligence requirements that are detailed further in this blog. 

Part III of the Rules apply to a new previously non-existent category of entities designated to be ‘publishers‘. This is further classified into subcategories of ‘publishers of news and current affairs content’ and ‘publishers of online curated content’. Part III then sets up extensive requirements for publishers to adhere to specific codes of ethics, onerous content take-down requirements and three-tier grievance process with appeals lying to an Executive Inter-Departmental Committee of Central Government bureaucrats. 

Finally, the Rules contain two provisions that apply to all entities (i.e. intermediaries and publishers) relating to content-blocking orders. They lay out a new process by which Central Government officials can issue directions to delete, modify or block content to intermediaries and publishers, either following a grievance process (Rule 15) or including procedures of “emergency” blocking orders which may be passed ex-parte. These Rules stem from powers to issue directions to intermediaries to block public access of any information through any computer resource (Section 69A of the IT Act). Interestingly, these provisions have been introduced separately from the existing rules for blocking purposes called the Information Technology (Procedure and Safeguards for Blocking for Access of Information by Public) Rules, 2009

2. Key issues for intermediaries under the Rules

2.1 A new class of ‘social media intermediaries

The term ‘intermediary’ is a broadly defined term in the IT Act covering a range of entities involved in the transmission of electronic records. The Rules introduce two new sub-categories, being:

Given that a popular messaging app like Whatsapp has over 400 million users in India, the threshold appears to be fairly conservative. The Government may order any intermediary to comply with the same obligations as SSMIs (under Rule 6) if their services are adjudged to pose a risk of harm to national security, the sovereignty and integrity of India, India’s foreign relations or to public order.  

SSMIs have to follow substantially more onerous “additional due diligence” requirements to claim the intermediary safe harbour (including mandatory traceability of message originators, and proactive automated screening as discussed below). These new requirements raise privacy concerns and data security concerns, as they extend beyond the traditional ideas of platform  “due diligence”, they potentially expose content of private communications and in doing so create new privacy risks for users in India.    

2.2 Additional requirements for SSMIS: resident employees, mandated message traceability, automated content screening 

Extensive new requirements are set out in the new Rule 4 for SSMIs. 

Provisions to mandate modifications to the technical design of encrypted platforms to enable traceability seem to go beyond merely requiring intermediary due diligence. Instead they appear to draw on separate Government powers relating to interception and decryption of information (under Section 69 of the IT Act). In addition, separate stand-alone rules laying out procedures and safeguards for such interception and decryption orders already exist in the Information Technology (Procedure and Safeguards for Interception, Monitoring and Decryption of Information) Rules, 2009. Rule 4(2) even acknowledges these provisions–raising the question of whether these Rules (relating to intermediaries and their safe harbours) can be used to expand the scope of section 69 or rules thereunder. 

Proceedings initiated by Whatsapp LLC in the Delhi High Court, and Free and Open Source Software (FOSS) developer Praveen Arimbrathodiyil in the Kerala High Court have both challenged the legality and validity of Rule 4(2) on grounds including that they are ultra vires and go beyond the scope of their parent statutory provisions (s. 79 and 69A) and the intent of the IT Act itself. Substantively, the provision is also challenged on the basis that it would violate users’ fundamental rights including the right to privacy, and the right to free speech and expression due to the chilling effect that the stripping back of encryption will have.

Though the objective of the provision is laudable (i.e. to limit the circulation of violent or previously removed content), the move towards proactive automated monitoring has raised serious concerns regarding censorship on social media platforms. Rule 4(4) appears to acknowledge the deep tensions that this requirement raises with privacy and free speech concerns, as seen by the provisions that require these screening measures to be proportionate to the free speech and privacy of users, to be subject to human oversight, and reviews of automated tools to assess fairness, accuracy, propensity for bias or discrimination, and impact on privacy and security. However, given the vagueness of this wording compared to the trade-off of losing intermediary immunity, scholars and commentators are noting the obvious potential for ‘over-compliance’ and excessive screening out of content. Many (including the petitioner in the Praveen Arimbrathodiyil matter) have also noted that automated filters are not sophisticated enough to differentiate between violent unlawful images and legitimate journalistic material. The concern is that such measures could create a large-scale screening out of ‘valid’ speech and expression, with serious consequences for constitutional rights to free speech and expression which also protect ‘the rights of individuals to listen, read and receive the said speech‘ (Tata Press Ltd v. Mahanagar Telephone Nigam Ltd, (1995) 5 SCC 139). 

Such requirements appear to be aimed at creating more user-friendly networks of intermediaries. However, the imposition of a single set of requirements is especially onerous for smaller or volunteer-run intermediary platforms which may not have income streams or staff to provide for such a mechanism. Indeed, the petition in the Praveen Arimbrathodiyil matter has challenged certain of these requirements as being a threat to the future of the volunteer-led Free and Open Source Software (FOSS) movement in India, by placing similar requirements on small FOSS initiatives as on large proprietary Big Tech intermediaries.  

Other obligations that stipulate turn-around times for intermediaries include (i) a requirement to remove or disable access to content within 36 hours of receipt of a Government or court order relating the unlawful information on the intermediary’s computer resources (under Rule 3(1)(d)) and (ii) to provide information within 72 hours of receiving an order from a authorised Government agency undertaking investigative activity (under Rule 3(1)(j). 

Similar to the concerns with automated screening, there are concerns that the new grievance process could lead to private entities becoming the arbiters of appropriate content/ free speech — a position that was specifically reversed in a seminal 2015 Supreme Court decision that clarified that a Government or Court order was needed for content-takedowns.  

3. Key issues for the new ‘publishers’ subject to the Rules, including OTT players

3.1 New Codes of Ethics and three-tier redress and oversight system for digital news media and OTT players 

Digital news media and OTT players have been designated as ‘publishers of news and current affairs content’ and ‘publishers of online curated content’ respectively in Part III of the Rules. Each category has been then subjected to separate Codes of Ethics. In the case of digital news media, the Codes applicable to the newspapers and cable television have been applied. For OTT players, the Appendix sets out principles regarding content that can be created and display classifications. To enforce these codes and to address grievances from the public on their content, publishers are now mandated to set up a grievance system which will be the first tier of a three-tier “appellate” system culminating in an oversight mechanism by the Central Government with extensive powers of sanction.  

At least five legal challenges have been filed in various High Courts challenging the competence and authority of the Ministry of Electronics & Information Technology (MeitY) to pass the Rules and their validity namely (i) in the Kerala High Court, LiveLaw Media Private Limited vs Union of India WP(C) 6272/2021; in the Delhi High Court, three petitions tagged together being (ii) Foundation for Independent Journalism vs Union of India WP(C) 3125/2021, (iii) Quint Digital Media Limited vs Union of India WP(C)11097/2021, and (iv) Sanjay Kumar Singh vs Union of India and others WP(C) 3483/2021, and (v) in the Karnataka High Court, Truth Pro Foundation of India vs Union of India and others, W.P. 6491/2021. This is in addition to a fresh petition filed on 10 June 2021, in TM Krishna vs Union of India that is challenging the entirety of the Rules (both Part II and III) on the basis that they violate rights of free speech (in Article 19 of the Constitution), privacy (including in Article 21 of the Constitution) and that it fails the test of arbitrariness (under Article 14) as it is manifestly arbitrary and falls foul of principles of delegation of powers. 

Some of the key issues emerging from these Rules in Part III and the challenges to them are highlighted below. 

3.2 Lack of legal authority and competence to create these Rules

There has been substantial debate on the lack of clarity regarding the legal authority of the Ministry of Electronics & Information Technology (MeitY) under the IT Act. These concerns arise at various levels. 

First, there is a concern that Level I & II result in a privatisation of adjudications relating to free speech and expression of creative content producers – which would otherwise be litigated in Courts and Tribunals as matters of free speech. As noted by many (including the LiveLaw petition at page 33), this could have the effect of overturning judicial precedent in Shreya Singhal v. Union of India ((2013) 12 S.C.C. 73) that specifically read down s 79 of the IT Act  to avoid a situation where private entities were the arbiters determining the legitimacy of takedown orders.  Second, despite referring to “self-regulation” this system is subject to executive oversight (unlike the existing models for offline newspapers and broadcasting).

The Inter-Departmental Committee is entirely composed of Central Government bureaucrats, and it may review complaints through the three-tier system or referred directly by the Ministry following which it can deploy a range of sanctions from warnings, to mandating apologies, to deleting, modifying or blocking content. This also raises the question of whether this Committee meets the legal requirements for any administrative body undertaking a ‘quasi-judicial’ function, especially one that may adjudicate on matters of rights relating to free speech and privacy. Finally, while the objective of creating some standards and codes for such content creators may be laudable it is unclear whether such an extensive oversight mechanism with powers of sanction on online publishers can be validly created under the rubric of intermediary liability provisions.  

4. New powers to delete, modify or block information for public access 

As described at the start of this blog, the Rules add new powers for the deletion, modification and blocking of content from intermediaries and publishers. While section 69A of the IT Act (and Rules thereunder) do include blocking powers for Government, they only exist vis a vis intermediaries. Rule 15 also expands this power to ‘publishers’. It also provides a new avenue for such orders to intermediaries, outside of the existing rules for blocking information under the Information Technology (Procedure and Safeguards for Blocking for Access of Information by Public) Rules, 2009

More grave concerns arise from Rule 16 which allows for the passing of emergency orders for blocking information, including without giving an opportunity of hearing for publishers or intermediaries. There is a provision for such an order to be reviewed by the Inter-Departmental Committee within 2 days of its issue. 

Both Rule 15 and 16 apply to all entities contemplated in the Rules. Accordingly, they greatly expand executive power and oversight over digital media services in India, including social media, digital news media and OTT on-demand services. 

5. Conclusions and future implications

The new Rules in India have opened up deep questions for online intermediaries and providers of digital media services serving the Indian market. 

For intermediaries, this creates a difficult and even existential choice: the requirements, (especially relating to traceability and automated screening) appear to set an improbably high bar given the reality of their technical systems. However, failure to comply will result in not only the loss of a safe harbour from liability — but as seen in new Rule 7, also opens them up to punishment under the IT Act and criminal law in India. 

For digital news and OTT players, the consequences of non-compliance and the level of enforcement remain to be understood, especially given open questions regarding the validity of legal basis to create these rules. Given the numerous petitions filed against these Rules, there is also substantial uncertainty now regarding the future although the Rules themselves have the full force of law at present. 

Overall, it does appear that attempts to create a ‘digital media’ watchdog would be better dealt with in a standalone legislation, potentially sponsored by the Ministry of Information and Broadcasting (MIB) which has the traditional remit over such areas. Indeed, the administration of Part III of the Rules has been delegated by MeitY to MIB pointing to the genuine split in competence between these Ministries.  

Finally, the potential overlaps with India’s proposed Personal Data Protection Bill (if passed) also create tensions in the future. It remains to be seen if the provisions on traceability will survive the test of constitutional validity set out in India’s privacy judgement (Justice K.S. Puttaswamy v. Union of India, (2017) 10 SCC 1). Irrespective of this determination, the Rules appear to have some dissonance with the data retention and data minimisation requirements seen in the last draft of the Personal Data Protection Bill, not to mention other obligations relating to Privacy by Design and data security safeguards. Interestingly, despite the Bill’s release in December 2019, a definition for ‘social media intermediary’ that it included in an explanatory clause to its section 26(4) closely track the definition in Rule 2(w), but also departs from it by carving out certain intermediaries from the definition. This is already resulting in moves such as Google’s plea on 2 June 2021 in the Delhi High Court asking for protection from being declared a social media intermediary. 

These new Rules have exhumed the inherent tensions that exist within the realm of digital regulation between goals of the freedom of speech and expression, and the right to privacy and competing governance objectives of law enforcement (such as limiting the circulation of violent, harmful or criminal content online) and national security. The ultimate legal effect of these Rules will be determined as much by the outcome of the various petitions challenging their validity, as by the enforcement challenges raised by casting such a wide net that covers millions of users and thousands of entities, who are all engaged in creating India’s growing digital public sphere.

Photo credit: Gerd Altmann from Pixabay

Read more Global Privacy thought leadership:

South Korea: The First Case where the Personal Information Protection Act was Applied to an AI System

China: New Draft Car Privacy and Security Regulation is Open for Public Consultation

A New Era for Japanese Data Protection: 2020 Amendments to the APPI

New FPF Report Highlights Privacy Tech Sector Evolving from Compliance Tools to Platforms for Risk Management and Data Utilization

As we enter the third phase of development of the privacy tech market, purchasers are demanding more integrated solutions, product offerings are more comprehensive, and startup valuations are higher than ever, according to a new report from the Future of Privacy Forum and Privacy Tech Alliance. These factors are leading to companies providing a wider range of services, acting as risk management platforms, and focusing on support of business outcomes.

“The privacy tech sector is at an inflection point, as its offerings have expanded beyond assisting with regulatory compliance,” said FPF CEO Jules Polonetsky. “Increasingly, companies want privacy tech to help businesses maximize the utility of data while managing ethics and data protection compliance.”

According to the report, “Privacy Tech’s Third Generation: A Review of the Emerging Privacy Tech Sector,” regulations are often the biggest driver for buyers’ initial privacy tech purchases. Organizations also are deploying tools to mitigate potential harms from the use of data. However, buyers serving global markets increasingly need privacy tech that offers data availability and control and supports its utility, in addition to regulatory compliance. 

The report finds the COVID-19 pandemic has accelerated global marketplace adoption of privacy tech as dependence on digital technologies grows. Privacy is becoming a competitive differentiator in some sectors, and TechCrunch reports that 200+ privacy startups have together raised more than $3.5 billion over hundreds of individual rounds of funding. 

“The customers buying privacy-enhancing tech used to be primarily Chief Privacy Officers,” said report lead author Tim Sparapani. “Now it’s also Chief Marketing Officers, Chief Data Scientists, and Strategy Officers who value the insights they can glean from de-identified customer data.”

The report highlights five trends in the privacy enhancing tech market:

The report also draws seven implications for competition in the market:

The report makes a series of recommendations, including that the industry define as a priority a common vernacular for privacy tech; set standards for technologies in the “privacy stack” such as differential privacy, homomorphic encryption, and federated learning; and explore the needs of companies for privacy tech based upon their size, sector, and structure. It calls on vendors to recognize the need to provide adequate support to customers to increase uptake and speed time from contract signing to successful integration.

The Future of Privacy Forum launched the Privacy Tech Alliance (PTA) as a global initiative with a mission to define, enhance and promote the market for privacy technologies. The PTA brings together innovators in privacy tech with customers and key stakeholders.

Members of the PTA Advisory Board, which includes Anonos, BigID, D-ID, Duality, Ethyca, Immuta, OneTrust, Privacy Analytics, Privitar, SAP, Truata, TrustArc, Wirewheel, and ZL Tech, have formed a working group to address impediments to growth identified in the report. The PTA working group will define a common vernacular and typology for privacy tech as a priority project with chief privacy officers and other industry leaders who are members of FPF. Other work will seek to develop common definitions and standards for privacy-enhancing technologies such as differential privacy, homomorphic encryption, and federated learning and identify emerging trends for venture capitalists and other equity investors in this space. Privacy Tech companies can apply to join the PTA by emailing [email protected].


Perspectives on the Privacy Tech Market

Quotes from Members of the Privacy Tech Alliance Advisory Board on the Release of the “Privacy Tech’s Third Generation” Report

anonos feature image 1

“The ‘Privacy Tech Stack’ outlined by the FPF is a great way for organizations to view their obligations and opportunities to assess and reconcile business and privacy objectives. The Schrems II decision by the Court of Justice of the European Union highlights that skipping the second ‘Process’ layer can result in desired ‘Outcomes’ in the third layer (e.g., cloud processing of, or remote access to, cleartext data) being unlawful – despite their global popularity – without adequate risk management controls for decentralized processing.” — Gary LaFever, CEO & General Counsel, Anonos

bigid 1

“As a founding member of this global initiative, we are excited by the conclusions drawn from this foundational report – we’ve seen parallels in our customer base, from needing an enterprise-wide solution to the rich opportunity for collaboration and integration. The privacy tech sector continues to mature as does the imperative for organizations of all sizes to achieve compliance in light of the increasingly complicated data protection landscape.’’—Heather Federman, VP Privacy and Policy at BigID

logo

“There is no doubt of the massive importance of the privacy sector, an area which is experiencing huge growth. We couldn’t be more proud to be part of the Privacy Tech Alliance Advisory Board and absolutely support the work they are doing to create alignment in the industry and help it face the current set of challenges. In fact we are now working on a similar initiative in the synthetic media space to ensure that ethical considerations are at the forefront of that industry too.” — Gil Perry, Co-Founder & CEO, D-ID

dualitytechnologies

“We congratulate the Future of Privacy Forum and the Privacy Tech Alliance on the publication of this highly comprehensive study, which analyzes key trends within the rapidly expanding privacy tech sector. Enterprises today are increasingly reliant on privacy tech, not only as a means of ensuring regulatory compliance but also in order to drive business value by facilitating secure collaborations on their valuable and often sensitive data. We are proud to be part of the PTA Advisory Board, and look forward to contributing further to its efforts to educate the market on the importance of privacy-tech, the various tools available and their best utilization, ultimately removing barriers to successful deployments of privacy-tech by enterprises in all industry sectors” — Rina Shainski, Chairwoman, Co-founder, Duality

onetrustlogo

“Since the birth of the privacy tech sector, we’ve been helping companies find and understand the data they have, compare it against applicable global laws and regulations, and remediate any gaps in compliance. But as the industry continues to evolve, privacy tech also is helping show business value beyond just compliance. Companies are becoming more transparent, differentiating on ethics and ESG, and building businesses that differentiate on trust. The privacy tech industry is growing quickly because we’re able to show value for compliance as well as actionable business insights and valuable business outcomes.” — Kabir Barday, CEO, OneTrust

pa logo iqvia

“Leading organizations realize that to be truly competitive in a rapidly evolving marketplace, they need to have a solid defensive footing. Turnkey privacy technologies enable them to move onto the offense by safely leveraging their data assets rapidly at scale.” — Luk Arbuckle, Chief Methodologist, Privacy Analytics

1024px sap logo.svg

“We appreciate FPF’s analysis of the privacy tech marketplace and we’re looking forward to further research, analysis, and educational efforts by the Privacy Tech Alliance. Customers and consumers alike will benefit from a shared understanding and common definitions for the elements of the privacy stack.” — Corinna Schulze, Director, EU Government Relations, Global Corporate Affairs, SAP

unknown

“The report shines a light on the evolving sophistication of the privacy tech market and the critical need for businesses to harness emerging technologies that can tackle the multitude of operational challenges presented by the big data economy. Businesses are no longer simply turning to privacy tech vendors to overcome complexities with compliance and regulation; they are now mapping out ROI-focused data strategies that view privacy as a key commercial differentiator. In terms of market maturity, the report highlights a need to overcome ambiguities surrounding new privacy tech terminology, as well as discrepancies in the mapping of technical capabilities to actual business needs. Moving forward, the advantage will sit with those who can offer the right blend of technical and legal expertise to provide the privacy stack assurances and safeguards that buyers are seeking – from a risk, deployment and speed-to-value perspective. It’s worth noting that the growing importance of data privacy to businesses sits in direct correlation with the growing importance of data privacy to consumers. Trūata’s Global Consumer State of Mind Report 2021 found that 62% of global consumers would feel more reassured and would be more likely to spend with companies if they were officially certified to a data privacy standard. Therefore, in order to manage big data in a privacy-conscious world, the opportunity lies with responsive businesses that move with agility and understand the return on privacy investment. The shift from manual, restrictive data processes towards hyper automation and privacy-enhancing computation is where the competitive advantage can be gained and long-term consumer loyalty—and trust— can be retained.” — Aoife Sexton, Chief Privacy Officer and Chief of Product Innovation, Trūata

unknown 1

“As early pioneers in this space, we’ve had a unique lens on the evolving challenges organizations have faced in trying to integrate technology solutions to address dynamic, changing privacy issues in their organizations, and we believe the Privacy Technology Stack introduced in this report will drive better organizational decision-making related to how technology can be used to sustainably address the relationships among the data, processes, and outcomes.” — Chris Babel, CEO, TrustArc

wirewheel logo

“It’s important for companies that use data to do so ethically and in compliance with the law, but those are not the only reasons why the privacy tech sector is booming. In fact, companies with exceptional privacy operations gain a competitive advantage, strengthen customer relationships, and accelerate sales.” — Justin Antonipillai, Founder & CEO, Wirewheel

The right to be forgotten is not compatible with the Brazilian Constitution. Or is it?

Brazilian Supreme Federal Court

Author: Dr. Luca Belli

Dr. Luca Belli is Professor at FGV Law School, Rio de Janeiro, where he leads the CyberBRICS Project and the Latin American edition of the Computers, Privacy and Data Protection (CPDP) conference. The opinions expressed in his articles are strictly personal. The author can be contacted at [email protected].

The Brazilian Supreme Federal Court, or “STF” in its Brazilian acronym, recently took a landmark decision concerning the right to be forgotten (RTBF), finding that it is incompatible with the Brazilian Constitution. This attracted international attention to Brazil for a topic quite distant than the sadly frequent environmental, health, and political crises.

Readers should be warned that while reading this piece they might experience disappointment, perhaps even frustration, then renewed interest and curiosity and finally – and hopefully – an increased open-mindedness, understanding a new facet of the RTBF debate, and how this is playing out at constitutional level in Brazil.

This might happen because although the STF relies on the “RTBF” label, the content behind such label is quite different from what one might expect after following the same debate in Europe. From a comparative law perspective, this landmark judgment tellingly shows how similar constitutional rights play out in different legal cultures and may lead to heterogeneous outcomes based on the constitutional frameworks of reference.   

How it started: insolvency seasoned with personal data

As it is well-known, the first global debate on what it means to be “forgotten” in the digital environment arose in Europe, thanks to Mario Costeja Gonzalez, a Spaniard who, paradoxically, will never be forgotten by anyone due to his key role in the construction of the RTBF.

Costeja famously requested to deindex from Google Search information about himself that he considered to be no longer relevant. Indeed, when anyone “googled” his name, the search engine provided as the top results some link to articles reporting Costeja’s past insolvency as a debtor. Costeja argued that, despite having been convicted for insolvency, he had already paid his debt with Justice and society many years before and it was therefore unfair that his name would continue to be associated ad aeternum with a mistake he made in the past.

The follow up is well known in data protection circles. The case reached the Court of Justice of the European Union (CJEU), which, in its landmark Google Spain Judgment (C-131/12), established that search engines shall be considered as data controllers and, therefore, they have an obligation to de-index information that is inappropriate, excessive, not relevant, or no longer relevant, when a data subject to whom such data refer requests it. Such an obligation was a consequence of Article 12.b of Directive 95/46 on the protection of personal data, a pre-GDPR provision that set the basis for the European conception of the RTBF, providing for the “rectification, erasure or blocking of data the processing of which does not comply with the provisions of [the] Directive, in particular because of the incomplete or inaccurate nature of the data.”

The indirect consequence of this historic decision, and the debate it generated, is that we have all come to consider the RTBF in the terms set by the CJEU. However, what is essential to emphasize is that the CJEU approach is only one possible conception and, importantly, it was possible because of the specific characteristics of the EU legal and institutional framework. We have come to think that RTBF means the establishment of a mechanism like the one resulting from the Google Spain case, but this is the result of a particular conception of the RTBF and of how this particular conception should – or could – be implemented.

The fact that the RTBF has been predominantly analyzed and discussed through the European lenses does not mean that this is the only possible perspective, nor that this approach is necessary the best. In fact, the Brazilian conception of the RTBF is remarkably different from a conceptual, constitutional, and institutional standpoint. The main concern of the Brazilian RTBF is not how a data controller might process personal data (this is the part where frustration and disappointment might likely arise in the reader) but the STF itself leaves the door open to such possibility (this is the point where renewed interest and curiosity may arise).

The Brazilian conception of the right to be forgotten

Although the RTBF has acquired a fundamental relevance in digital policy circles, it is important to emphasize that, until recently, Brazilian jurisprudence had mainly focused on the juridical need for “forgetting” only in the analogue sphere. Indeed, before the CJEU Google Spain decision, the Brazilian Supreme Court of Justice or “STJ” – the other Brazilian Supreme Court that deals with the interpretation of the Law, differently from the previously mentioned STF, which deals with the interpretation of constitutional matters – had already considered the RTBF as a right not to be remembered, affirmed by the individual vis-à-vis traditional media outlets.

This interpretation first emerged in the “Candelaria massacre” case, a gloomy page of Brazilian history, featuring a multiple homicide perpetrated in 1993 in front of the Candelaria Church, a beautiful colonial Baroque building in Rio de Janeiro’s downtown. The gravity and the particularly picturesque stage of the massacre led Globo TV, a leading Brazilian broadcaster, to feature the massacre in a TV show called Linha Direta. Importantly, the show included in the narration some details about a man suspected of being one of the perpetrators of the massacre but later discharged.

Understandably, the man filed a complaint arguing that the inclusion of his personal information in the TV show was causing him severe emotional distress, while also reviving suspects against him, for a crime he had already been discharged of many years before. In September 2013, further to Special Appeal No. 1,334,097, the STJ agreed with the plaintiff establishing the man’s “right not to be remembered against his will, specifically with regard to discrediting facts.” This is how the RTBF was born in Brazil.

Importantly for our present discussion, this interpretation is not born out of digital technology and does not impinge upon the delisting of specific type of information as results of search engine queries. In Brazilian jurisprudence the RTBF has been conceived as a general right to effectively limit the publication of certain information. The man included in the Globo reportage had been discharged many years before, hence he had a right to be “let alone,” as Warren and Brandeis would argue, and not to be remembered for something he had not even committed. The STJ, therefore, constructed its vision of the RTBF, based on article 5.X of the Brazilian Constitution, enshrining the fundamental right to intimacy and preservation of image, two fundamental features of privacy. 

Hence, although they utilize the same label, the STJ and CJEU conceptualize two remarkably different rights, when they refer to the RTBF. While both conceptions aim at limiting access to specific types of personal information, the Brazilian conception differs from the EU one on at least three different levels.

First, their constitutional foundations. While both conceptions are intimately intertwined with individuals’ informational self-determination, the STJ built the RTBF based on the protection of privacy, honour and image, whereas the CJEU built it upon the fundamental right to data protection, which in the EU framework is a standalone fundamental right. Conspicuously, in the Brazilian constitutional framework an explicit right to data protection did not exist at the time of the Candelaria case and only since 2020 it has been in the process of being recognized

Secondly, and consequently, the original goal of the Brazilian conception of the RTBF was not to regulate how a controller should process personal data but rather to protect the private sphere of the individual. In this perspective, the goal of STJ was not – and could not have been – to regulate the deindexation of specific incorrect or outdated information, but rather to regulate the deletion of “discrediting facts” so that the private life, honour and image of any individual might be illegitimately violated.

Finally, yet extremely importantly, the fact that, at the time of the decision, an institutional framework dedicated to data protection was simply absent in Brazil did not allow the STJ to have the same leeway of the CJEU. The EU Justices enjoyed the privilege of delegating to search engine the implementation of the RTBF because, such implementation would have received guidance and would have been subject to the review of a well-consolidated system of European Data Protection Authorities. At the EU level, DPAs are expected to guarantee a harmonious and consistent interpretation and application of data protection law. At the Brazilian level, a DPA has just been established in late 2020 and announced its first regulatory agenda only in late January 2021.

This latter point is far from trivial and, in the opinion of this author, an essential preoccupation that might have driven the subsequent RTBF conceptualization of the STJ.

The stress-test

The soundness of the Brazilian definition of the RTBF, however, was going to be tested again by the STJ, in the context of another grim and unfortunate page of Brazilian story, the Aida Curi case. This case originated with the sexual assault and subsequent homicide of the young Aida Curi, in Copacabana, Rio de Janeiro, on the evening of 14 July 1958. At the time the case crystallized considerable media attention, not only because of its mysterious circumstances and the young age of the victim, but also because the sexual assault perpetrators tried to dissimulate it by throwing the body of the victim from the rooftop of a very high building on the Avenida Atlantica, the fancy avenue right in front of the Copacabana beach.

Needless to say, Globo TV considered the case as a perfect story for yet another Linha Direta episode. Aida Curi’s relatives, far from enjoying the TV show, sued the broadcaster for moral damages and demanded the full enjoyment of their RTBF – in the Brazilian conception, of course. According to the plaintiffs, it was indeed not conceivable that, almost 50 years after the murder, Globo TV could publicly broadcast personal information about the victim – and her family – including the victim’s name and address, in addition to unauthorized images, thus bringing back a long-closed and extremely traumatic set of events.

The brothers of Aida Curi claimed reparation against Rede Globo, but the STJ, decided that the time passed was enough to mitigate the effects of anguish and pain on the dignity of Aida Curi’s relatives, while arguing that it was impossible to report the events without mentioning the victim. This decision was appealed by Ms Curi’s family members, who demanded by means of Extraordinary Appeal No. 1,010,606, that STF recognized “their right to forget the tragedy.” It is interesting to note that the way the demand is constructed in this Appeal exemplifies tellingly the Brazilian conception of “forgetting” as erasure and prohibition from divulgation.

At this point, the STF identified in the Appeal the interest of debating the issue “with general repercussion” which is a peculiar judicial process that the Court can utilize when recognizes that a given case has particular relevance and transcendence for the Brazilian legal and judicial system. Indeed, the decision of a case with general repercussion does not only bind the parties but rather establishes a jurisprudence that must be replicated by all lower-level courts.

In February 2021, the STF finally deliberated on the Aida Curi case, establishing that “the idea of ​​a right to be forgotten is incompatible with the Constitution, thus understood as the power to prevent, due to the passage of time, the disclosure of facts or data that are true and lawfully obtained and published in analogue or digital media” and that “any excesses or abuses in the exercise of freedom of expression and information must be analyzed on a case-by-case basis, based on constitutional parameters – especially those relating to the protection of honor, image, privacy and personality in general – and the explicit and specific legal provisions existing in the criminal and civil spheres.”

In other words, what the STF has deemed as incompatible with the Federal Constitution is a specific interpretation of the Brazilian version of the RTBF. What is not compatible with the Constitution is to argue that the RTBF allows to prohibit publishing true facts, lawfully obtained. At the same time, however, the STF clearly states that it remains possible for any Court of law to evaluate, on a case-by-case basis and according to constitutional parameters and existing legal provisions, if a specific episode can allow the use of the RTBF to prohibit the divulgation of information that undermine the dignity, honour, privacy, or other fundamental interests of the individual.

Hence, while explicitly prohibiting the use of the RTBF as a general right to censorship, the STF leaves room for the use of the RTBF for delisting specific personal data in an EU-like fashion, while specifying that this must be done finding guidance in the Constitution and the Law.

What next?

Given the core differences between the Brazilian and EU conception of the RTBF, as highlighted above, it is understandable in the opinion of this author that the STF adopted a less proactive and more conservative approach. This must be especially considered in light of the very recent establishment of a data protection institutional system in Brazil.

It is understandable that the STF might have preferred to de facto delegate the interpretation of when and how the RTBF could be rightfully invoked before Courts, according to constitutional and legal parameters. First, in the Brazilian interpretation of the RTBF, this right fundamentally insist on the protection of privacy – i.e. the private sphere of an individual – and, while admitting the existence of data protection concerns, these are not the main ground on which the Brazilian RTBF conception relays.

It is understandable that in a country and a region where the social need to remember and shed light on what happened in a recent history, marked by dictatorships, well-hidden atrocities, and opacity, outweighs the legitimate individual interest to prohibit the circulation of truthful and legally obtained information. In the digital sphere, however, the RTBF quintessentially translates into an extension of informational self-determination, which the Brazilian General Data Protection Law, better known as “LGPD” (Law No. 13.709 / 2018), enshrines in its article 2 as one of the “foundations” of data protection in the country and that whose fundamental character was recently recognized by the STF itself.

In this perspective, it is useful to remind the dissenting opinion of Justice Luiz Edson Fachin, in the Aida Curi case, stressing that “although it does not expressly name it, the Constitution of the Republic, in its text, contains the pillars of the right to be forgotten, as it celebrates the dignity of the human person (article 1, III), the right to privacy (article 5, X) and the right to informational self-determination – which was recognized, for example, in the disposal of the precautionary measures of the Direct Unconstitutionality Actions No. 6,387, 6,388, 6,389, 6,390 and 6,393, under the rapporteurship of Justice Rosa Weber (article 5, XII).”

It is the opinion of this author that the Brazilian debate on the RTBF in the digital sphere would be clearer if it its dimension as a right to deindexation of search engines results were to be clearly regulated. It is understandable that the STF did not dare regulating this, given its interpretation of the RTBF and the very embryonic data protection institutional framework in Brazil. However, given the increasing datafication we are currently witnessing, it would be naïve not to expect that further RTBF claims concerning the digital environment and, specifically, the way search engines process personal data will keep emerging.

The fact that the STF has left the door open to apply the RTBF in the case-by-case analysis of individual claims may reassure the reader regarding the primacy of constitutional and legal arguments in such case-by-case analysis. It may also lead the reader to – very legitimately – wonder whether such a choice is the facto the most efficient to deal with the potentially enormous number of claims and in the most coherent way, given the margin of appreciation and interpretation that each different Court may have.  

An informed debate able to clearly highlight what are the existing options and what might be the most efficient and just ways to implement them, considering the Brazilian context, would be beneficial. This will likely be one of the goals of the upcoming Latin American edition of the Computers, Privacy and Data Protection conference (CPDP LatAm) that will take place in July, entirely online, and will aim at exploring the most pressing issues for Latin American countries regarding privacy and data protection.

Photo Credit: “Brasilia – The Supreme Court” by Christoph Diewald is licensed under CC BY-NC-ND 2.0

If you have any questions about engaging with The Future of Privacy Forum on Global Privacy and Digital Policymaking contact Dr. Gabriela Zanfir-Fortuna, Senior Counsel, at [email protected].

FPF announces appointment of Malavika Raghavan as Senior Fellow for India

The Future of Privacy Forum announces the appointment of Malavika Raghavan as Senior Fellow for India, expanding our Global Privacy team to one of the key jurisdictions for the future of privacy and data protection law. 

Malavika is a thought leader and a lawyer working on interdisciplinary research, focusing on the impacts of digitisation on the lives of lower-income individuals. Her work since 2016 has focused on the regulation and use of personal data in service delivery by the Indian State and private sector actors. She has founded and led the Future of Finance Initiative for Dvara Research (an Indian think tank) in partnership with the Gates Foundation from 2016 until 2020, anchoring its research agenda and policy advocacy on emerging issues at the intersection of technology, finance and inclusion. Research that she led at Dvara Research was cited by the India’s Data Protection Committee in its White Paper as well as its final report with proposals for India’s draft Personal Data Protection Bill, with specific reliance placed on such research on aspects of regulatory design and enforcement. See Malavika’s full bio here.

“We are delighted to welcome Malavika to our Global Privacy team. For the following year, she will be our adviser to understand the most significant developments in privacy and data protection in India, from following the debate and legislative process of the Data Protection Bill and the processing of non-personal data initiatives, to understanding the consequences of the publication of the new IT Guidelines. India is one of the most interesting jurisdictions to follow in the world, for many reasons: the innovative thinking on data protection regulation, the potentially groundbreaking regulation of non-personal data and the outstanding number of individuals whose privacy and data protection rights will be envisaged by these developments, which will test the power structures of digital regulation and safeguarding fundamental rights in this new era”, said Dr. Gabriela Zanfir-Fortuna, Global Privacy lead at FPF. 

We have asked Malavika to share her thoughts for FPF’s blog on what are the most significant developments in privacy and digital regulation in India and about India’s role in the global privacy and digital regulation debate.

FPF: What are some of the most significant developments in the past couple of years in India in terms of data protection, privacy, digital regulation?

Malavika Raghavan: “Undoubtedly, the turning point for the privacy debate India was the 2017 judgement of the Indian Supreme Court in Justice KS Puttaswamy v Union of India. The judgment affirmed the right to privacy as a constitutional guarantee, protected by Part III (Fundamental Rights) of the Indian Constitution. It was also regenerative, bringing our constitutional jurisprudence into the 21st century by re-interpreting timeless principles for the digital age, and casting privacy as a prerequisite for accessing other rights—including the right to life and liberty, to freedom of expression and to equality—given the ubiquitous digitisation of human experience we are witnessing today. 

Overnight, Puttaswamy also re-balanced conversations in favour of privacy safeguards to make these equal priorities for builders of digital systems, rather than framing these issues as obstacles to innovation and efficiency. In addition, it challenged the narrative that privacy is an elite construct that only wealthy or privileged people deserve— since many litigants in the original case that had created the Puttaswamy reference were from marginalised groups. Since then, a string of interesting developments have arisen as new cases are reassessing the impact of digital technology on individuals in India, for e.g. the boundaries case of private sector data sharing (such as between Whatsapp and Facebook), or the State’s use of personal data (as in the case concerning Aadhaar, our national identification system) among others. 

Puttaswamy also provided fillip for a big legislative development, which is the creation of an omnibus data protection law in India. A bill to create this framework was proposed by a Committee of Experts under the chairmanship of Justice Srikrishna (an ex-Supreme Court judge), which has been making its way through ministerial and Parliamentary processes. There’s a large possibility that this law will be passed by the Indian parliament in 2021! Definitely a big development to watch.

FPF: How do you see India’s role in the global privacy and digital regulation debate?

Malavika Raghavan: “India’s strategy on privacy and digital regulation will undoubtedly have global impact, given that India is home to 1/7th of the world’s population! The mobile internet revolution has created a huge impact on our society with millions getting access to digital services in the last couple of decades. This has created nuanced mental models and social norms around digital technologies that are slowly being documented through research and analysis. 

The challenge for policy makers is to create regulations that match these expectations and the realities of Indian users to achieve reasonable, fair regulations. As we have already seen from sectoral regulations (such as those from our Central Bank around cross border payments data flows) such regulations also have huge consequences for global firms interacting with Indian users and their personal data.  

In this context, I think India can have the late-mover advantage in some ways when it comes to digital regulation. If we play our cards right, we can take the best lessons from the experience of other countries in the last few decades and eschew the missteps. More pragmatically, it seems inevitable that India’s approach to privacy and digital regulation will also be strongly influenced by the Government’s economic, geopolitical and national security agenda (both internationally and domestically). 

One thing is for certain: there is no path-dependence. Our legislators and courts are thinking in unique and unexpected ways that are indeed likely to result in a fourth way (as described by the Srikrishna Data Protection Committee’s final report), compared to the approach in the US, EU and China.”

If you have any questions about engaging with The Future of Privacy Forum on Global Privacy and Digital Policymaking contact Dr. Gabriela Zanfir-Fortuna, Senior Counsel, at [email protected].

India: Massive overhaul of digital regulation, with strict rules for take-down of illegal content and Automated scanning of online content

Taj Mahal 1209004 1920

On February 25, the Indian Government notified and published Information Technology (Guidelines for Intermediaries and Digital media Ethics Code) Rules 2021. These rules mirror the Digital Services Act (DSA) proposal of the EU to some extent, since they propose a tiered approach based on the scale of the platform, they touch on intermediary liability, content moderation, take-down of illegal content from online platforms, as well as internal accountability and oversight mechanisms, but they go beyond such rules by adding a Code of Ethics for digital media, similar to the Code of Ethics classic journalistic outlets must follow, and by proposing an “online content” labelling scheme for content that is safe for children.

The Code of Ethics applies to online news publishers, as well as intermediaries that “enable the transmission of news and current affairs”. This part of the Guidelines (the Code of Ethics) has already been challenged in the Delhi High Court by news publishers this week. 

The Guidelines have raised several types of concerns in India, from their impact on freedom of expression, impact on the right to privacy through the automated scanning of content and the imposed traceability of even end-to-end encrypted messages so that the originator can be identified, to the choice of the Government to use executive action for such profound changes. The Government, through the two Ministries involved in the process, is scheduled to testify in the Standing Committee of Information Technology of the Parliament on March 15.

New obligations for intermediaries

“Intermediaries” include “websites, apps and portals of social media networks, media sharing websites, blogs, online discussion forums, and other such functionally similar intermediaries” (as defined in rule 2(1)(m)).

Here are some of the most important rules laid out in Part II of the Guidelines, dedicated to Due Diligence by Intermediaries:

“Significant social media intermediaries” have enhanced obligations

“Significant social media intermediaries” are social media services with a number of users above a threshold which will be defined and notified by the Central Government. This concept is similar to the the DSA’s “Very Large Online Platform”, however the DSA includes clear criteria in the proposed act itself on how to identify a VLOP.

As for Significant Social Media Intermediaries” in India, they will have additional obligations (similar to how the DSA proposal in the EU scales obligations): 

These “Guidelines” seem to have the legal effect of a statute, and they are being adopted through executive action to replace Guidelines adopted in 2011 by the Government, under powers conferred to it in the Information Technology Act 2000. The new Guidelines would enter into force immediately after publication in the Official Gazette (no information as to when publication is scheduled). The Code of Ethics would enter into force three months after the publication in the Official Gazette. As mentioned above, there are already some challenges in Court against part of these rules.

Get smart on these issues and their impact

Check out these resources: 

Another jurisdiction to keep your eyes on: Australia

Also note that, while the European Union is starting its heavy and slow legislative machine, by appointing Rapporteurs in the European Parliament and having first discussions on the DSA proposal in the relevant working group of the Council, another country is set to soon adopt digital content rules: Australia. The Government is currently considering an Online Safety Bill, which was open to public consultation until mid February and which would also include a “modernised online content scheme”, creating new classes of harmful online content, as well as take-down requirements for image-based abuse, cyber abuse and harmful content online, requiring removal within 24 hours of receiving a notice from the eSafety Commissioner.

If you have any questions about engaging with The Future of Privacy Forum on Global Privacy and Digital Policymaking contact Dr. Gabriela Zanfir-Fortuna, Senior Counsel, at [email protected].

Russia: New Law Requires Express Consent for Making Personal Data Available to the Public and for Any Subsequent Dissemination

Authors: Gabriela Zanfir-Fortuna and Regina Iminova

Moscow 2742642 1920 1
Source: Pixabay.Com, by Opsa

Amendments to the Russian general data protection law (Federal Law No. 152-FZ on Personal Data) adopted at the end of 2020 enter into force today (Monday, March 1st), with some of them having the effective date postponed until July 1st. The changes are part of a legislative package that is also seeing the Criminal Code being amended to criminalize disclosure of personal data about “protected persons” (several categories of government officials). The amendments to the data protection law envision the introduction of consent based restrictions for any organization or individual that publishes personal data initially, as well as for those that collect and further disseminate personal data that has been distributed on the basis of consent in the public sphere, such as on social media, blogs or any other sources. 

The amendments:

The potential impact of the amendments is broad. The new law prima facie affects social media services, online publishers, streaming services, bloggers, or any other entity who might be considered as making personal data available to “an indefinite number of persons.” They now have to collect and prove they have separate consent for making personal data publicly available, as well as for further publishing or disseminating PDD which has been lawfully published by other parties originally.

Importantly, the new provisions in the Personal Data Law dedicated to PDD do not include any specific exception for processing PDD for journalistic purposes. The only exception recognized is processing PDD “in the state and public interests defined by the legislation of the Russian Federation”. The Explanatory Note accompanying the amendments confirms that consent is the exclusive lawful ground that can justify dissemination and further processing of PDD and that the only exception to this rule is the one mentioned above, for state or public interests as defined by law. It is thus expected that the amendments might create a chilling effect on freedom of expression, especially when also taking into account the corresponding changes to the Criminal Code.

The new rules seem to be part of a broader effort in Russia to regulate information shared online and available to the public. In this context, it is noteworthy that other amendments to Law 149-FZ on Information, IT and Protection of Information solely impacting social media services were also passed into law in December 2020, and already entered into force on February 1st, 2021. Social networks are now required to monitor content and “restrict access immediately” of users that post information about state secrets, justification of terrorism or calls to terrorism, pornography, promoting violence and cruelty, or obscene language, manufacturing of drugs, information on methods to commit suicide, as well as calls for mass riots. 

Below we provide a closer look at the amendments to the Personal Data Law that entered into force on March 1st, 2021. 

A new category of personal data is defined

The new law defines a category of “personal data allowed by the data subject to be disseminated” (PDD), the definition being added as paragraph 1.1 to Article 3 of the Law. This new category of personal data is defined as “personal data to which an unlimited number of persons have access to, and which is provided by the data subject by giving specific consent for the dissemination of such data, in accordance with the conditions in the Personal Data Law” (unofficial translation). 

The old law had a dedicated provision that referred to how this type of personal data could be lawfully processed, but it was vague and offered almost no details. In particular, Article 6(10) of the Personal Data Law (the provision corresponding to Article 6 GDPR on lawful grounds for processing) provided that processing of personal data is lawful when the data subject gives access to their personal data to an unlimited number of persons. The amendments abrogate this paragraph, before introducing an entirely new article containing a detailed list of conditions for processing PDD only on the basis of consent (the new Article 10.1).

Perhaps in order to avoid misunderstanding on how the new rules for processing PDD fit with the general conditions on lawful grounds for processing personal data, a new paragraph 2 is introduced in Article 10 of the law, which details conditions for processing special categories of personal data, to clarify that processing of PDD “shall be carried out in compliance with the prohibitions and conditions provided for in Article 10.1 of this Federal Law”.

Specific, express, unambiguous and separate consent is required

Under the new law, “data operators” that process PDD must obtain specific and express consent from data subjects to process personal data, which includes any use, dissemination of the data. Notably, under the Russian law, “data operators” designate both controllers and processors in the sense of the General Data Protection Regulation (GDPR), or businesses and service providers in the sense of the California Consumer Privacy Act (CCPA).

Specifically, under Article 10.1(1), the data operator must ensure that it obtains a separate consent dedicated to dissemination, other than the general consent for processing personal data or other type of consent. Importantly, “under no circumstances” may individuals’ silence or inaction be taken to indicate their consent to the processing of their personal data for dissemination, under Article 10.1(8).

In addition, the data subject must be provided with the possibility to select the categories of personal data which they permit for dissemination. Moreover, the data subject also must be provided with the possibility to establish “prohibitions on the transfer (except for granting access) of [PDD] by the operator to an unlimited number of persons, as well as prohibitions on processing or conditions of processing (except for access) of these personal data by an unlimited number of persons”, per Article 10.1(9). It seems that these prohibitions refer to specific categories of personal data provided by the data subject to the operator (out of a set of personal data, some categories may be authorized for dissemination, while others may be prohibited from dissemination).

If the data subject discloses personal data to an unlimited number of persons without providing to the operator the specific consent required by the new law, not only the original operator, but all subsequent persons or operators that processed or further disseminated the PDD have the burden of proof to “provide evidence of the legality of subsequent dissemination or other processing”, under Article 10.1(2), which seems to imply that they must prove consent was obtained for dissemination (probatio diabolica in this case). According to the Explanatory Note to the amendments, it seems that the intention was indeed to turn the burden of proof of legality of processing PDD from data subjects to the data operators, since the Note makes a specific reference to the fact that before the amendments the burden of proof rested with data subjects.

If the separate consent for dissemination of personal data is not obtained by the operator, but other conditions for lawfulness of processing are met, the personal data can be processed by the operator, but without the right to distribute or disseminate them – Article 10.1.(4). 

A Consent Management Platform for PDD, managed by the Roskomnadzor

The express consent to process PDD can be given directly to the operator or through a special “information system” (which seems to be a consent management platform) of the Roskomnadzor, according to Article 10.1(6). The provisions related to setting up this consent platform for PDD will enter into force on July 1st, 2021. The Roskomnadzor is expected to provide technical details about the functioning of this consent management platform and guidelines on how it is supposed to be used in the following months. 

Absolute right to opt-out of dissemination of PDD

Notably, the dissemination of PDD can be halted at any time, on request of the individual, regardless of whether the dissemination is lawful or not, according to Article 12.1(12). This type of request is akin to a withdrawal of consent. The provision includes some requirements for the content of such a request. For instance, it requires writing contact information and listing the personal data that should be terminated. Consent to the processing of the provided personal data is terminated once the operator receives the opt-out request – Article 10.1(13).

A request to opt-out of having personal data disseminated to the public when this is done unlawfully (without the data subject’s specific, affirmative consent) can also be made through a Court, as an alternative to submitting it directly to the data operator. In this case, the operator must terminate the transmission of or access to personal data within three business days from when such demand was received or within the timeframe set in the decision of the court which has come into effect – Article 10.1(14).

A new criminal offense: The prohibition on disclosure of personal data about protected persons

Sharing personal data or information about intelligence officers and their personal property is now a criminal offense under the new rules, which amended the Criminal Code. The law obliges any operators of personal data, including government departments and mobile operators, to ensure the confidentiality of personal information concerning protected persons, their relatives, and their property. Under the new law, “protected persons” include employees of the Investigative Committee, FSB, Federal Protective Service, National Guard, Ministry of Internal Affairs, and Ministry of Defense judges, prosecutors, investigators, law enforcement officers and their relatives. Moreover, the list of protected persons can be further detailed by the head of the relevant state body in which the specified persons work.

Previously, the law allowed for the temporary prohibition of the dissemination of personal data of protected persons only in the event of imminent danger in connection with official duties and activities. The new amendments make it possible to take protective measures in the absence of a threat of encroachment on their life, health and property.

What to watch next: New amendments to the general Personal Data Law are on their way in 2021

There are several developments to follow in this fast changing environment. First, at the end of January, the Russian President gave the government until August 1 to create a set of rules for foreign tech companies operating in Russia, including a requirement to open branch offices in the country.

Second, a bill (No. 992331-7) proposing new amendments to the overall framework of the Personal Data Law (No. 152-FZ) was introduced in July 2020 and was the subject of a Resolution that passed in the State Duma on February 16, allowing for a period for amendments to be submitted, until March 16. The bill is on the agenda for a potential vote in May. The changes would entail expanding the possibility to obtain valid consent through other unique identifiers which are currently not accepted by the law, such as unique online IDs, changes to purpose limitation, a possible certification scheme for effective methods to erase personal data and new competences for the Roskomnadzor to establish requirements for deidentification of personal data and specific methods for effective deidentification.

If you have any questions on Global Privacy and Data Protection developments, contact Gabriela Zanfir-Fortuna at [email protected]

Data-Driven Pricing: Key Technologies, Business Practices, and Policy Implications

In the U.S., state lawmakers are seeking to regulate various pricing strategies that fall under the umbrella of “data-driven pricing”: practices that use personal and/or non-personal data to continuously inform decisions about the prices and products offered to consumers. Using a variety of terms—including “surveillance,” “algorithmic,” and “personalized” pricing—legislators are targeting a range of practices that often look different from one another, and carry different benefits and risks. Generally speaking, these practices fall under one of four categories:

This resource distinguishes between these different pricing strategies in order to help lawmakers, businesses, and consumers better understand how these different practices work.

Tech to Support Older Adults and Caregivers: Five Privacy Questions for Age Tech

Introduction

As the U.S. population ages, technologies that can help support older adults are becoming increasingly important. These tools, often called “AgeTech”, exist at the intersection of health data, consumer technology, caregiving relationships, and increasingly, artificial intelligence, and are drawing significant investment. Hundreds of well funded start-ups have launched. Many are of major interest to governments, advocates for aging populations, and researchers who are concerned about the impact on the U.S. economy when a smaller workforce supports a large aging population.

AgeTech may include everything from fall detection wearables and remote vital sign monitors to AI-enabled chatbots and behavioral nudging systems. These technologies promise greater independence for older adults, reduced burden on caregivers, and more continuous, personalized care. But that promise brings significant risks, especially when these tools operate outside traditional health privacy laws like HIPAA and instead fall under a shifting mix of consumer privacy regimes and emerging AI-specific regulations.

A recent review by FPF of 50 AgeTech products reveals a market increasingly defined by data-driven insights, AI-enhanced functionality, and personalization at scale. Yet despite the sophistication of the technology, privacy protections remain patchy and difficult to navigate. Many tools were not designed with older adults or caregiving relationships in mind, and few provide clear information about how AI is used or how sensitive personal data feeds into machine learning systems.

Without frameworks for trustworthiness and subsequent trust from older adults and caregivers, the gap between innovation and accountability will continue to grow, placing both individuals and companies at risk. Further, low trust may result in barriers to adoption at a time when these technologies are urgently needed as the aging population grows and care shortages continue.

A Snapshot of the AgeTech Landscape

AgeTech is being deployed across both consumer and clinical settings, with tools designed to serve four dominant purposes:

Clinical applications are typically focused on enabling real-time oversight and remote data collection, while consumer-facing products are aimed at supporting safety, independence, and quality of life at home. Regardless of setting, these tools increasingly rely on combinations of sensors, mobile apps, GPS, microphones, and notably, AI used for everything from fall detection and cognitive assistance to mood analysis and smart home adaptation.

AI is becoming central to how AgeTech tools operate and how they’re marketed. But explainability remains a challenge and disclosures around AI use can be vague or missing altogether. Users may not be told when AI is interpreting their voice, gestures, or behavior, let alone whether their data is used to refine predictive models or personalize future content.

For tools that feel clinical but aren’t covered by HIPAA, this creates significant confusion and risk. A proliferation of consumer privacy laws, particularly emerging state-level privacy laws with health provisions are starting to fill the gap, leading to complex and fragmented privacy policies. For all stakeholders seeking to improve and support aging through AI and other technologies, harmonious policy-based and technical privacy protections are essential.

AgeTech Data is Likely in Scope of Many States’ Privacy Laws

Compounding the issue is the reality that these tools often fall into regulatory gray zones. If a product isn’t offered by a HIPAA-covered entity or used in a reimbursed clinical service, it may not be protected under federal health privacy law at all. Instead, protections depend on the state where a user lives, or whether the product falls under one of a growing number of state-level privacy laws or consumer health privacy laws.

Laws like New York’s S929/NY HIPA, which remains in legislative limbo, reflect growing state interest in regulating sensitive and consumer health data that would likely be collected by AgeTech devices and apps. These laws are a step toward closing a gap in privacy protections, but they’re not consistent. Some focus narrowly on specific types of health data individually or in tandem with AI or other technologies. For example, mental health chatbots (Utah HB452), reproductive health data (Virginia SB754), or AI disclosures in clinical settings (California AB3030). Other bills and laws have broad definitions that include location, movement, and voice data, all common types of data in our survey of AgeTech. Regulatory obligations may vary not just by product type, but by geography, payment model (where insurance may cover a product or service), and user relationship. 

Consent + Policy is Key to AgeTech Growth and Adoption

In many cases, it is not the older adult but a caregiver, whether a family member, home health aide, or neighbor, who initiates AgeTech use and agrees to data practices. These caregiving relationships are diverse, fluid, and often informal. Yet most technologies assume a static one-to-one dynamic and offer few options for nuanced role-based access or changing consent over time.

For this reason, AgeTech is a good example of why consent should not be the sole pillar of data privacy. While important, relying on individual permissions can obscure the need for deeper infrastructure and policy solutions that relieve consent burdens while ensuring privacy. Devices and services that align privacy protections with contextual uses and create pathways for evidence-based, science-backed innovation that benefits older adults and their care communities are needed.

Five Key Questions for AgeTech Privacy

To navigate this complexity and build toward better, more trustworthy systems, privacy professionals and policymakers can start by asking the following key questions:

  1. Is the technology designed to reflect caregiving realities?

Caregiving relationships are rarely linear. Tools must accommodate shared access, changing roles, and the reality that caregivers may support multiple people, or that multiple people may support the same individual. Regulatory standards should reflect this complexity, and product designs should allow for flexible access controls that align with real-world caregiving.

  1. Does the regulatory classification reflect the sensitivity of the data, not just who offers the tool?

Whether a fall alert app is delivered through a clinical care plan or bought directly by a consumer, it often collects the same data and has the same impact on a person’s autonomy. Laws should apply based on function and risk. Laws should also consider the context and use of data in addition to sensitivity. Emerging state laws are beginning to take this approach, but more consistent federal leadership is needed.

  1. Are data practices accessible, not just technically disclosed?

Especially in aging populations, accessibility is not just about font size, it’s about cognitive load, clarity of language, and decision-making support. Tools should offer layered notices, explain settings in plain language, and support revisiting choices as health or relationships change. Future legislation could require transparency standards tailored to vulnerable populations and caregiving scenarios.

  1. Does the technology reinforce autonomy and dignity?

The test for responsible AgeTech is not just whether it works, but whether it respects. Does the tool allow older adults to make choices about their data, even when care is shared or delegated? Can those preferences evolve over time? Does it reinforce the user’s role as the central decision-maker, or subtly replace their agency with automation?

  1. If a product uses or integrates AI, is it clearly indicated if and how data is used for AI?

AI is powering an increasing share of AgeTech’s functionality—but many tools don’t disclose whether data is used to train algorithms, personalize recommendations, or drive automated decisions. Privacy professionals should ask: Is AI use clearly labeled and explained to users? Are there options to opt out of certain AI-driven features? Is sensitive data (e.g., voice, movement, mood) being reused for model improvement or inference? In a rapidly advancing field, transparency is essential for building trustworthy AI.

A Legislative and Technological Path Forward

Privacy professionals are well-positioned to guide both product development and policy advocacy. As AgeTech becomes more central to how we deliver and experience care, the goal should not be to retrofit consumer tools into healthcare settings without safeguards. Instead, we need to modernize privacy frameworks to reflect the reality that sensitive, life-impacting technologies now exist outside the clinic.

This will require:

The future of aging with dignity will be shaped by whether we can build privacy into the systems that support it. That means moving beyond consent and toward real protections, at the policy level, in the technology stack, and in everyday relationships that make care possible.

Nature of Data in Pre-Trained Large Language Models

The following is a guest post to the FPF blog by Yeong Zee Kin, the Chief Executive of the Singapore Academy of Law and FPF Senior Fellow. The guest blog reflects the opinion of the author only. Guest blog posts do not necessarily reflect the views of FPF.

The phenomenon of memorisation has fomented significant debate over whether Large Language Models (LLM) store copies of the data that they are trained on.1 In copyright circles, this has led to lawsuits such as the one by the New York Times against OpenAI that alleges that ChatGPT will reproduce NYT articles nearly verbatim.2 While in the privacy space, much ink has also been spilt over the question whether LLMs store personal data. 

This blog post commences with an overview of what happens to data that is processed during LLM training3: first, how data is tokenised, and second, how the model learns and embeds contextual information within the neural network. Next, it discusses how LLMs store data and contextual information differently from classical information storage and retrieval systems, and examines the legal implications that arise from this. Thereafter, it attempts to demystify the phenomenon of memorisation, to gain a better understanding of why partial regurgitation occurs. This blog post concludes with some suggestions on how LLMs can be used in AI systems for fluency, while highlighting the importance of providing grounding and the safeguards that can be considered when personal data is processed.

While this is not a technical paper, it aims to be sufficiently technical so as to provide an accurate description of the relevant internal components of LLMs and an explanation of how model training changes them. By demystifying how data is stored and processed by LLMs, this blog post aims to provide guidance on where technical measures can be most effectively applied in order to address personal data protection risks. 

  1. What are the components of a Large Language Model?

LLMs are causal language models that are optimised for predicting the next word based on previous words.4 An LLM comprises a parameter file, a runtime script and configuration files.5 The LLM’s algorithm resides in the script, which is a relatively small component of the LLM.6 Configuration and parameter files are essentially text files (i.e. data).7 Parameters are the learned weights and biases,8 expressed as numerical values, that are crucial for the model’s prediction: they represent the LLM’s pre-trained state.9 In combination, the parameter file, runtime script and configuration files form a neural network. 

There are two essential stages to model training. The first stage is tokenisation. This is when training data is broken down into smaller units (i.e. segmented) and converted into tokens. For now, think of each token as representing a word (we will discuss subword tokenisation later). Each token is assigned a unique ID. The mapping of each token to its unique ID is stored in a lookup table, which is referred to as the LLM’s vocabulary. The vocabulary is one of the LLM’s configuration files. The vocabulary plays an important role during inference: it is used to encode input text for processing and decode output sequences back into human-readable text (i.e. the generated response).

fig 1

Figure 1. Sample vocabulary list from GPT-Legal; each token is associated with an ID (the vocabulary size of GPT-Legal is 128,256 tokens).

The next stage is embedding. This is a mathematical process that distills contextual information about each token (i.e. word) from the training data and encodes it into a numerical representation known as a vector. A vector is created for each token: this is known as the token vector. During LLM training, the mathematical representations of tokens (their vectors) are refined as the LLM learns from the training data. When LLM training is completed, token vectors are stored in the trained model. The mapping of the unique ID and token vector is stored in the parameter file as an embedding matrix. Token vectors are used by LLMs during inference to create the initial input vector that is fed through the neural network.

fig 2

Figure 2. Sample embedding matrix from GPT-Legal: each row is one token vector, each value is one dimension (GPT-Legal has 128,256 token vectors, each with 4,096 dimensions)

LLMs are neural networks that may be visualised as layers of nodes with connections between them.10 Adjustments to embeddings also take place in the neural network during LLM training. Model training adjusts the weights and biases of the connections between these nodes. This changes how input vectors are transformed as they pass through the layers of the neural network during inference. This produces an output vector that the LLM uses to compute a probability score for each potential token that may follow, which increases or decreases the probability that one token will follow another. The LLM uses these probability scores to select the next token through various sampling methods.11 This is how LLMs predict the next token when generating responses.

In the following sections, we dive deeper into each of these stages to better understand how data is processed and stored in the LLM.

Stage 1: Tokenisation of training data 

During the tokenisation stage, text is converted into tokens. This is done algorithmically by applying the chosen tokenisation technique. There are different methods of tokenisation, each with its benefits and limitations. Depending on the tokenisation method used, each token may represent a word or a subword (i.e. segments of the word). 

The method that is commonly used in LLMs is subword tokenisation.12 It provides benefits over word-level tokenisation, such as a smaller vocabulary, which can lead to more efficient training.13 Subword tokenisation analyses the training corpus to identify subword units based on the frequency with which a set of characters occurs. For example, “pseudonymisation” may be broken up into “pseudonym” and “isation”; while, “reacting” may be broken up into “re”, “act” and “ing”. Each subword forms its own token.

Taking this approach results in a smaller vocabulary since common prefixes (e.g. “re”) and suffixes (e.g. “isation” and “ing”) have their own tokens that can be re-used in combination with other stem words (e.g. combining with “mind” to form “remind” and “minding”). This improves efficiency during model training and inference. Subword tokens may also contain white space or punctuation marks. This enables the LLM to learn patterns, such as which subwords are usually prefixes, which are usually suffixes, and how frequently certain words are used at the start or end of a sentence. 

Subword tokenisation also enables the LLM to handle out-of-vocabulary (OOV) words. This happens when the LLM is provided with a word during inference that it did not encounter during training. By segmenting the new word into subwords, there is a higher chance that the subwords of the OOV word are found in its vocabulary. Each subword token is assigned a unique ID. The mapping of a token with its unique ID is stored in a lookup table in a configuration file, known as the vocabulary, which is a crucial component of the LLM. It should be noted that this is the only place within the LLM where human-readable text appears. The LLM uses the unique ID of the token in all its processing.

The training data is encoded by replacing subwords with their unique ID before processing.14 This process of converting the original text into a sequence of IDs corresponding to tokens is referred to as tokenisation. During inference, input text is also tokenised for processing. It is only at the decoding stage that human-readable words are formed when the output sequence is decoded by replacing token IDs with the matching subwords in order to generate a human-readable response.

Stage 2: Embedding contextual information

Complex contextual information can be reflected as patterns in high-dimensional vectors. The greater the complexity, the higher the number of features that are needed. These are reflected as parameters of the high dimension vectors. Contrariwise, low dimension vectors contain fewer features and have lower representational capacity. 

The embedding stage of LLM training captures the complexities of semantics and syntax as high dimension vectors. The semantic meaning of words, phrases and sentences and the syntactic rules of grammar and sentence structure are converted into numbers. These are reflected as values in a string of parameters that form part of the vector. In this way, the semantic meaning of words and relevant syntactic rules are embedded in the vector: i.e. embeddings. 

During LLM training, a token vector is created for each token. The token vector is adjusted to reflect the contextual information about the token as the LLM learns from the training corpus. With each iteration of LLM training, the LLM learns about the relationships of the token, e.g. where it appears and how it relates to the tokens before and after. In order to embed all this contextual information, the token vector has a large number of parameters, i.e. it is a high dimension vector. At the end of LLM training, the token vector is fixed and stored in the pre-trained model. Specifically, the mapping of unique ID and token vector is stored as an embedding matrix in the parameter file. 

Model training also embeds contextual information in the layers of the neural network by adjusting the connections between nodes. As the LLM learns from the training corpus during model training, the weights of connections between nodes are modified. These adjustments encode patterns from the training corpus that reflect the semantic meaning of words and the syntactic rules governing their usage.15 Training may also increase or decrease the biases of nodes. Adjustments to model weights and bias affect how input vectors are transformed as they pass through the layers of the neural network. These are reflected in the model’s parameters. Thus, contextual information is also embedded in the layers of the neural network during LLM training. Contextual embeddings form the deeper layers of the neural network.

Contextual embeddings increase or decrease the likelihood that one token will follow another when the LLM is generating a response. During inference, the LLM converts the input text into tokens and looks up the corresponding token vector from its embedding matrix. The model also generates contextual representations that capture how the token relates to other tokens in the sequence. Next, the LLM creates an input vector by combining the static token vector and the contextual vector. As input vectors pass through the neural network, they are transformed by the contextual embeddings in its deeper layers. Output vectors are used by the LLM to compute probability scores for the tokens, which reflect the likelihood that one subword (i.e. token) will follow another. LLMs generate responses using the computed probability scores. For instance, based on these probabilities, it is more likely that the subword that follows “re” is going to be “mind” or “turn” (since “remind” and “return” are common words), less likely to be “purpose” (unless the training dataset contains significant technical documents where “repurpose” is used); and extremely unlikely to be “step” (since “restep” is not a recognised word).

Thus, LLMs capture the probabilistic relationships between tokens based on patterns in the training data and as influenced by training hyperparameters. LLMs do not store the entire phrase or textual string that was processed during the training phase in the same way that this would be stored in a spreadsheet, database or document repository. While LLMs do not store specific phrases or strings, they are able to generalise and create new combinations based on the patterns they have learnt from the training corpus.

2. Do LLMs store personal data?

Personal data is information about an individual who can be identified or is identifiable from the information on its own (i.e. direct) or in combination with other accessible information (i.e. indirect).16 From this definition, several pertinent characteristics of personal data may be identified. First, personal data is information in the sense that it is a collection of several datapoints. Second, that collection must be associated with an individual. Third, that individual must be identifiable from the collection of datapoints alone or in combination with other accessible information. This section examines whether data that is stored in LLMs retain these qualities.

An LLM does not store personal data in the way that a spreadsheet, database or document repository stores personal data. Billing and shipping information about a customer may be stored as a row in a spreadsheet; the employment details, leave records, and performance records of an employee may be stored as records in the tables of a relational database; and the detailed curriculum vitae of prospective, current and past employees may be contained in separate documents stored in a document repository. In these information storage and retrieval systems, personal data is stored intact and its association with the individual is preserved: the record may also be retrieved in its entirety or partially. In other words, each collection of datapoints about an individual is stored as a separate record; and if the same datapoint is common to multiple records, it appears in each of those records.17

Additionally, information storage and retrieval systems are designed to allow structured queries to select and retrieve specific records, either partially or in its entirety. The integrity of storage and retrieval underpins data protection obligations such as accuracy and data security (to prevent unauthorised alteration or deletion), and data subject rights such as correction and erasure.

For the purpose of this discussion, imagine that the training dataset comprises billing and shipping records that contain names, addresses and contact information such as email addresses and telephone numbers. During training, subword tokens are created from names in the training corpus. These may be used in combination to form names and may also be used to form email addresses (since many people use a variation of their names for their email address) and possibly even street names (since names are often named after famous individuals). The LLM is able to generate billing and shipping information that conform to the expected patterns, but the information will likely be incorrect or fictitious. This explains the phenomenon of hallucinations.

During LLM training, personal data is segmented into subwords during tokenisation. This adaptation or alteration of personal data amounts to processing, which is why a legal basis must be identified for model training. The focus of this discussion is the nature of the tokens and embeddings that are stored within the LLM after model training: are they still in the nature of personal data? The first observation that may be made is that many words that make up names (or other personal information) may be segmented into subwords. For example, “Edward” may not be stored in the vocabulary as is but segmented into the subwords “ed” and “ward”. Both these subwords can be used during decoding to form other words, such as “edit” and “forward”. This example shows how a word that started as part of a name (i.e. personal data), after segmentation, produces subwords that can be reused to form other types of words (some of which may be personal data, some of which may not be personal data). 

Next, while the vocabulary may contain words that correspond to names or other types of identifiers, the way they are stored in the lookup table as discrete tokens removes the quality of identification from the word. A lookup table is essentially that: a table. It may be sorted by alphanumeric or chronological order (e.g. recent entries are appended to the end of the table). The vocabulary stores datapoints but not the association between datapoints that enables them to form a collection which can relate to an identifiable individual. By way of illustration, having the word “Coleman” in the vocabulary as a token is neither here nor there, since it could equally be the name of Hong Kong’s highest-ranked male tennis player (Coleman Wong) or the street that the Singapore Academy of Law is located (Coleman Street). The vocabulary does not store any association of this word to either Coleman Wong (as part of his name) or to the Chief Executive of the Singapore Academy of Law (as part of his office address).

Furthermore, subword tokenisation enables a token to be used in multiple combinations during inference. Keeping with this illustration, the token “Coleman” may be used in combination with either “Wong” or “Street” when the LLM is generating a response. The LLM does not store “Coleman Wong” as a name or “Coleman Street” as a street name. The association of datapoints to form a collection is not stored. What the LLM stores are learned patterns about how words and phrases typically appear together, based on what it observed in the training data. Hence, if there are many persons named “Coleman” in the training dataset but with different surnames, and no one else whose address is “Coleman Street”, then the LLM is likely to predict a different word after “Coleman” during inference. 

Thus, LLMs do not store personal data in the same manner as traditional information storage and retrieval systems; more importantly, they are not designed to enable query and retrieval of personal data. To be clear, personal data in the training corpus is processed during tokenisation. Hence, a legal basis must be identified for model training. However, model training does not learn the associations of datapoints inter se nor the collection of datapoints with an identifiable individual, such that the data that is ultimately stored in the LLM loses the quality of personal data.18 

3. What about memorisation?

A discussion of how LLMs store and reproduce data is incomplete without a discussion of the phenomenon of memorisation. This is a characteristic of LLMs that reflects the patterns of words that are found in sufficiently large quantities in the training corpus. When certain combination of words or phrases appear consistently and frequently in the training corpus, the probability of predicting that combination of words or phrases increases. 

Memorisation in LLMs is closely related to two key machine learning concepts: bias and overfitting. Bias occurs when training data overrepresents certain patterns, causing models to develop a tendency toward reproducing those specific sequences. Overfitting occurs when a model learns training examples too precisely, including noise and specific details, rather than learning generalisable patterns. Both phenomena exacerbate memorisation of training data, particularly personal information that appears frequently in the dataset. For example, Lee Kuan Yew is Singapore’s first prime minister post-Independence with significant global influence; he lived at 38 Oxley Road. LLMs trained on a corpus of data from the Internet would have learnt this. Hence, ChatGPT is able to produce a response (without searching the Web) about who he is and where he lived. It is able to reproduce (as opposed to retrieve) personal data about him because they appeared in the training corpus in a significant volume. Because this sequence of words appeared frequently – and often – in the training corpus, when the LLM is given the sequence of words “Lee Kuan”, the probability of predicting “Yew” is significantly higher than any other word; and in the context of name and address of Singapore’s first prime minister, the probability of predicting Lee Kuan Yew and 38 Oxley Road is significantly higher than others. 

This explains the phenomenon of memorisation. Memorisation occurs when the LLM learns frequent patterns and reproduces closely related datapoints. It should be highlighted that this reproduction is probabilistic. This is not the same as query and retrieval of data stored as records in deterministic information systems.

The first observation to be made is that whilst this is acceptable for famous figures, the same cannot be said for private individuals. Knowing that this phenomenon reflects the training corpus, the obvious thing to avoid is the use of personal data for training of LLMs. This exhortation applies equally to developers of pre-trained LLMs and deployers who may fine-tune LLMs or engage in other forms of post-training, such as reinforcement learning. There are ample good practices for this. Techniques may be applied on the training corpus before model training to remove, reduce or hide personal data: e.g. pseudonymisation (to de-identify individuals in the training corpus), data minimisation (to exclude unnecessary personal data) and differential privacy (adding random noise to obfuscate personal data). When inclusion of personal data in the training corpus is unavoidable, there are mitigatory techniques that can be applied to the trained model.

One such example is machine unlearning, a technique currently under active research and development, that has the potential of removing the influence of specific data points from the trained model. This technique may be applied to reduce the risk of reproducing personal data.

Another observation that may be made is that the reproduction of personal data is not verbatim but paraphrased, hence it is also referred to as partial regurgitation. This underscores the fact that the LLM does not store the associations between datapoints necessary to make them a collection of information about an individual. Even if personal data is reproduced, it is because of the high probability scores for that combination of words, and not the output of a query and retrieval function. Paraphrasing may introduce distortions or inaccuracies when reproducing personal data, such as variations in job titles or appointments. Reproduction is also inconsistent and oftentimes incomplete.

Unsurprising, since the predictions are probabilistic after all. 

Finally, it bears reiterating that personal data is not stored as is but segmented into subwords, and reproduction of personal data is probabilistic, with no absolute guarantee that a collection of datapoints about an individual will always be reproduced completely or accurately. Thus, reproduction is not the same as retrieval. Parenthetically, it may also be reasoned that if the tokens and embeddings do not possess the quality of personal data, their combination during inference is processing of data, but just not the processing of personal data. Be that as it may, the risk of reproducing personal data – however, incomplete and inaccurate – can and must still be addressed. Technical measures such as output filters can be implemented as part of the AI system. These are directed at the responses generated by the model and not the model itself.

4. How should we use LLMs to process personal data?

LLMs are not designed or intended to store and retrieve personal data in the same way that traditional information storage and retrieval systems are; but they can be used to process personal data. In AI systems, LLMs provide fluency during the generation of responses. LLMs can incorporate personal data in their responses when personal data is provided, e.g., personal data provided as part of user prompts, or when user prompts cause the LLM to reproduce personal data as part of the generated response.

When LLMs are provided with user prompts that include reference documents that provide grounding for the generated response, the documents may also contain personal data. For example, a prompt to generate a curriculum vitae (CV) in a certain format may contain a copy of an outdated resume, a link to a more recent online bio and a template the LLM is to follow when generating the CV. The LLM can be constrained by well-written prompts to generate an updated CV using the personal information provided and formatted in accordance with the template. In this example, the personal data that the LLM uses will likely be from the sources that have been provided by the user and not from the LLM’s vocabulary. 

Further, the LLM will paraphrase the information in the CV that it generates. The randomness of the predicted text is controlled by adjusting the temperature of the LLM. A higher temperature setting will increase the chance that a lower probability token will be selected as the prediction, thereby increasing the creativity (or randomness) of the generated response. Even at its lowest temperature setting, the LLM may introduce mistakes by paraphrasing job titles and appointments or combining information from different work experiences. These errors occur because the LLM generates text based on learned probabilities rather than factual accuracy. For this reason, it is important to vet and correct generated responses, even if proper grounding has been provided.

A more systematic way of providing grounding is through Retrieval Augmented Generation (RAG) whereby the LLM is deployed in an AI system that includes a trusted source, such as a knowledge management repository. When a query is provided, it is processed by the AI system’s embedding model which converts the entire query into an embedding vector that captures its semantic meaning. This embedding vector is used to conduct a semantic search. This works by identifying embeddings in the vector database (i.e. a database containing document embeddings precomputed from the trusted source) that have the closest proximity (e.g. via Euclidean or cosine distance).19 These distance metrics measure how similar the semantic meanings are. Embeddings that are close together (e.g. nearest neighbour) are said to be semantically similar.20 Semantically similar passages are retrieved from the repository and appended to the prompt that is sent to the LLM for the generation of a response. The AI system may generate multiple responses and select the most relevant one based on either semantic similarity to the query or in accordance with a re-ranking mechanism (e.g. heuristics to improve alignment with intended task).

5. Concluding remarks

LLMs are not designed to store and retrieve information (including personal data). From the foregoing discussion, it may be said that LLMs do not store personal data in the same manner as information storage and retrieval systems. Data stored in the LLM’s vocabulary do not retain the relationships necessary for the retrieval of personal data completely or accurately. The contextual information embedded in the token vectors and neural network reflects patterns in the training corpus. Given how tokens are stored and re-used, the contextual embeddings are not intended to provide the ability to store the relationships between datapoints such that the collection of datapoints is able to describe an identifiable individual.

By acquiring a better understanding of how LLMs store and process data, we are able to design better trust and safety guardrails in the AI systems that they are deployed in. LLMs play an important role in providing fluency during inference, but they are not intended to perform query and retrieval functions. These functions are performed by other components of the AI system, such as the vector database or knowledge management repository in a RAG implementation. 

Knowing this, we can focus our attention on those areas that are most efficacious in preventing the unintended reproduction of personal data in generated responses. During model development, steps may be taken to address the risk of the reproduction of personal data. These are steps for developers who undertake post-training, such as fine tuning and reinforcement learning.

(a) First, technical measures may be applied to the training corpus to remove, minimise, or obfuscate personal data. This reduces the risk of the LLM memorising personal data. 

(b) Second, new techniques like model unlearning may be applied to reduce the influence of specific data points when the trained model generates a response.

When deploying LLMs in AI systems, steps may also be taken to protect personal data. The measures are very dependent on intended use cases of the AI system and the assessed risks. Crucially, these are measures that are within the ken of most deployers of LLMs (by contrast, a very small number of deployers will have the technical wherewithal to modify LLMs directly through post-training). 

(a) First, remove or reduce personal data from trusted sources if personal data is unnecessary for the intended use case. Good data privacy practices such as pseudonymisation and data minimisation should be observed.

(b) Second, if personal data is necessary, store and retrieve them from trusted sources. Use information storage and retrieval systems that are designed to preserve the confidentiality, integrity and accuracy of stored information. Personal data from trusted sources can thus be provided as grounding for prompts to the LLM. 

(c) Third, consider implementing data loss prevention measures in the AI system. For example, prompt filtering reduces the risk of including unauthorised personal data in user prompts. Likewise, output filtering reduces the risk of unintended reproduction of personal data in responses generated by the AI system.

Taking a holistic approach enables deployers to introduce appropriate levels of safeguards to reduce the risks of unintended reproduction of personal data.21

  1. Memorisation is often also known as partial regurgitation, which does not require verbatim reproduction; regurgitation, on the other hand, refers to the phenomenon of LLMs reproducing verbatim excerpts of text from their training data.
    ↩︎
  2. The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work (27 Dec 2023) New York Times; see also, Audrey Hope “NYT v. OpenAI: The Times’s About-Face” (10 April 2024) Harvard Law Review. 
    ↩︎
  3. This paper deals with the processing of text for training LLMs. It does not deal with other types of foundational models, such as multi-model models that can handle text as well as images and audio.
    ↩︎
  4. See, e.g., van Eijk R, Gray S and Smith M, ‘Technologist Roundtable: Key Issues in AI and Data Protection’ (2024) https://fpf.org/wp-content/uploads/2024/12/Post-Event-Summary-and-Takeaways-_-FPF-Roundtable-on-AI-and-Privacy-1-2.pdf (accessed 26 June 2025). ↩︎
  5. Christopher Samiullah, “The Technical User’s Introduction to Large Language Models (LLMs)” https://christophergs.com/blog/intro-to-large-language-models-llms (accessed 3 July 2025).
    ↩︎
  6. LLM model packages contain different components depending on their intended use. Inference models like ChatGPT are optimized for real-time conversation and typically share only the trained weights, tokenizer, and basic configuration files—while keeping proprietary training data, fine-tuning processes, system prompts, and foundation models private. In contrast, open source research models like LLaMA 2 often include comprehensive documentation about training datasets, evaluation metrics, reproducibility details, complete model weights, architecture specifications, and may release their foundation models for further development, though the raw training data itself is rarely distributed due to size and licensing constraints. See, e.g., https://huggingface.co/docs/hub/en/model-cards (accessed 26 June 2025).
    ↩︎
  7. Configuration files are usually stored as readable text files, while parameter files are stored in compressed binary formats to save space and improve processing speed.
    ↩︎
  8. Weights influence the connections between nodes, while biases influence the nodes themselves: “Neural Network Weights: A Comprehensive Guide” https://www.coursera.org/articles/neural-network-weights (accessed 4 July 2025). ↩︎
  9. An LLM that is ready for developers to use for inference is referred to as pre-trained. Developers may deploy the pre-trained LLM as is, or they may undertake further training using their private datasets. An example of such post-training is fine-tuning. ↩︎
  10.  LLMs are made up of the parameter file, runtime script and configuration files which together form a neural network: supra, fn 5 and the discussion in the accompanying main text. ↩︎
  11. While it could pick the token with the highest probability score, this would produce repetitive, deterministic outputs. Instead, modern LLMs typically use techniques like temperature scaling or top-p sampling to introduce controlled randomness, resulting in more diverse and natural responses. ↩︎
  12. Yekun Chai, et al, “Tokenization Falling Short: On Subword Robustness in Large Language Models” arXiv:2406.11687, section 2.1.
    ↩︎
  13. Word-level tokenisation results in a large vocabulary as every word stemming from a root word is treated as a separate word (e.g. consider, considering, consideration). It also has difficulties handling languages that do not use white spaces to establish word boundaries (e.g. Chinese, Korean, Japanese) or languages that use compound words (e.g. German).
    ↩︎
  14.  WordPiece and Byte Pair Encoding are two common techniques used for subword tokenisation.
    ↩︎
  15. To be clear, the LLM learns relationships and not explicit semantics or syntax. ↩︎
  16. Definition of personal data in Singapore’s Personal Data Protection Act 2012, s 2 and UK GDPR, s 4(1). ↩︎
  17. Depending on the information storage and retrieval system used, common data points could be stored as multiple copies (eg XML database) or in a code list (eg, spreadsheet or relational database).
    ↩︎
  18. Note from the editor: This statement should be read primarily within the framework of Singapore’s Personal Data Protection Act.
    ↩︎
  19. Masked language models (eg, BERT) are used for this, as these models are optimised to capture the semantic meaning of words and sentences better (but not textual generation). Masked language models enable semantic searches. ↩︎
  20. The choice of distance metric can affect the results of the search.
    ↩︎
  21. This paper benefited from reviewers who commented on earlier drafts. I wish to thank Pavandip Singh Wasan, Prof Lam Kwok Yan, Dr Ong Chen Hui and Rob van Eijk for their technical insight and very instructive comments; and Ms Chua Ying Hong, Jeffrey Lim and Dr Gabriela Zanfir-Fortuna for their very helpful suggestions. ↩︎

Malaysia Charts Its Digital Course: A Guide to the New Frameworks for Data Protection and AI Ethics

The digital landscape in Malaysia is undergoing a significant transformation. With major amendments to its Personal Data Protection Act (PDPA) taking effect in June 2025, the country is decisively updating its data protection standards to meet the demands of the global digital economy. This modernization effort is complemented by a forward-looking approach to artificial intelligence (AI), marked by the introduction of the National Guidelines on AI Governance & Ethics in September 2024. Together, these initiatives represent a robust attempt to build a trusted and innovative digital ecosystem.

This post will unpack these landmark initiatives. First, we will examine the key amendments to Malaysia’s PDPA, focusing on the new obligations for businesses and how they compare with the European Union (EU)’s General Data Protection Regulation (GDPR) and other regional laws. We will then delve into the National AI Ethics Guidelines, analyzing its core principles and its place within the Association of Southeast Asian Nations (ASEAN) AI governance landscape. By exploring both, it becomes visible that strong data protection serves as a critical foundation for trustworthy AI, a central theme in Malaysia’s digital strategy.
Key takeaways include:

A. Personal Data Protection (Amendment) Act 2024

1. Background

Malaysia was the first ASEAN Member State to enact comprehensive data protection legislation. Its PDPA, which was enacted in June 2010 and came into force in November 2013, set a precedent in the region.

However, for nearly a decade, the PDPA remained largely unchanged. Recognizing the need to keep up with rapid technological advancements and evolving global privacy standards (such as the 2016 enactment of the GDPR), then-Minister for Communications and Multimedia (now Digital Minister) Gobind Singh Deo revealed plans to review the PDPA in October 2018

In February 2020, Malaysia’s Personal Data Protection Department (PDPD) took the first step by issuing a consultation paper proposing to amend the PDPA in 22 areas. Due to delays from the COVID-19 pandemic and subsequent changes in the Malaysian government, a draft bill was only finalized in August 2022, narrowing the focus to five key amendments:

  1. Requiring the appointment of a DPO.
  2. Introducing mandatory data breach notification requirements.
  3. Extending the Security Principle to data processors.
  4. Introducing a right to data portability.
  5. Revising the PDPA’s cross-border data transfer regime.

The amendment process regained momentum following the establishment of a new Digital Ministry in December 2023 as part of a broader cabinet reshuffle.

The resulting Personal Data Protection (Amendment) Act 2024 (Amendment Act) was passed by both houses of Malaysia’s Parliament in July 2024 and was enacted in October 2024. The amendments came into effect in stages:

During this transition period, the PDPD began consultations on seven new guidelines to provide greater clarity on new obligations under the updated PDPA. To date, the PDPD has released guidelines on (1) appointing DPOs; (2) data breach notifications; and (3) cross-border data transfers. It is also developing guidelines on: (1) data portability; (2) DPIAs; (3) Privacy-by-Design (DPbD); and profiling and automated decision-making (ADM). 

2. The amendments align the PDPA more closely with both international and regional data protection standards

The Amendment Act brings the PDPA closer to other influential global frameworks, such as the GDPR. This carries similarities with regulatory efforts by some other ASEAN Member States, including the enactment of GDPR-like laws in Thailand (2019), Indonesia (2022) and to a lesser extent, Vietnam (2023).

It also follows a broader trend of initiatives in the Asia-Pacific (APAC) region to bring longer-established data protection laws closer to international norms. These include extensive amendments to data protection laws in New Zealand (2020), Singapore (2021), and Australia (2024), as well as an ongoing review of Hong Kong’s law, which began in 2020.

One example of how the Amendment Act brings the PDPA closer to globally recognized norms is the replacement of the term “data user” with “data controller.” While this update is primarily cosmetic and does not change the entity’s substantive obligations, it aligns the PDPA’s terminology more closely with that of the GDPR and other similar laws. 

The following subsections discuss in detail the key amendments introduced by the Amendment Act, illustrating their implications and alignment with both regional and international standards.

2.1. Like the GDPR, the amendments define biometric data as sensitive

The Amendment Act classifies “biometric data” as “sensitive personal data.” The Amendment Act’s definition of “biometric data” is, in fact, potentially broader than its counterpart in the GDPR, as the former does not require that the data must allow or confirm the unique identification of that person.

Organizations processing biometric data may need to revise their compliance practices to comply with the more stringent requirements for processing sensitive personal data (such as obtaining express consent prior to processing), unless one of a narrow list of exceptions applies. However, this is unlikely to pose major challenges to organizations whose compliance strategies take the GDPR as the starting point.

2.2. Like other ASEAN data protection laws, the amendments introduce a new requirement to appoint a DPO

The Amendment Act requires data controllers to appoint a DPO, and register the appointment within 21 days of the appointment. If the DPO changes, controllers must also update registration information within 14 days of the change.

Both controllers and processors must also publish the business contact information of their DPO on official websites, in privacy notices, and in security policies and guidelines. This should include a dedicated official business email account, separate from the DPO’s personal and regular business email.

To provide guidance on this new requirement, the PDPD published a Guideline and Circular on the appointment of DPOs (DPO Guideline) in May 2025 that clarifies and in some cases substantially augments the DPO requirements under the amended PDPA.
The DPO Guideline introduces a quantitative threshold for appointing a DPO. Controllers and processors are only required to appoint a DPO if they:

The DPO Guideline also outlines DPOs’ duties. These duties include serving as the primary point of contact for authorities and data subjects, providing compliance advice, conducting impact assessments, and managing data breach incidents. DPOs do not need to be resident in Malaysia but must be easily contactable and proficient in English and the national language (i.e., Bahasa Melayu). A single DPO may be appointed to serve multiple controllers or processors, provided that the DPO is given sufficient resources and is contactable by the organization, the Commissioner, and data subjects.

The DPO Guideline also prescribes skill requirements. A DPO must have knowledge of data protection law and technology, an understanding of the business’s data processing operations, and the ability to promote a data protection culture with integrity. The required skill level depends on the complexity, scale, sensitivity and level of protection required for the data being processed. 
The amendment aligns Malaysia’s PDPA more closely with data protection laws in the Philippines and Singapore (in this regard) than with the GDPR. Specifically, the Philippines and Singapore both require organizations to appoint at least one DPO. Conversely, Indonesia and Thailand adopt the GDPR’s approach in this regard, requiring DPO appointments only for: (1) public authorities; (2) organizations conducting large-scale systematic monitoring, and (3) those processing sensitive data.

2.3. The amendments significantly increase penalties for PDPA breaches but do not introduce revenue-based fines

The Amendment Act allows the Personal Data Protection Commissioner (Commissioner) to impose:

Notably, the increase in the PDPA’s penalty structure was not one of the proposals raised in the PDPD’s initial consultation paper released in 2020. Nevertheless, these enhanced penalties are consistent with (albeit still lower than) those seen in other ASEAN data protection laws that have been enacted or amended since the GDPR came into effect. These amendments also follow the GDPR’s example in increasing the maximum penalty to either a substantial fine (under the GDPR, 20,000,000 EUR) or a percentage of the organization’s revenue (under the GDPR, up to 4% of its total worldwide annual turnover of the preceding financial year). In ASEAN, data protection laws that have been similarly drafted include:

2.4. The amendments extend security obligations to data processors

Though the PDPA has always drawn a distinction between controllers (previously termed “data users”) and processors, prior to the 2024 amendments, it did not subject data processors to the PDPA’s Security Principle. This Principle requires organizations to take practical steps to protect the personal data from any loss, misuse, modification, unauthorized or accidental access or disclosure, alteration or destruction.

As amended, the PDPA now requires data processors to comply with the Security Principle and provide sufficient guarantees to data controllers that data processors have implemented technical and organizational security measures to ensure compliance with the Principle. 

This amendment aligns the PDPA with the GDPR and the majority of other ASEAN data protection laws, which all impose security obligations on data processors.

Following the amendments, the PDPD began consulting on new guidelines outlining security controls to comply with the Security Principle. However, to date, these guidelines do not appear to have been finalized.

2.5. The amendments establish a significant new data portability right for data subjects in Malaysia

The Amendment Act introduces a new Section 43A into the PDPA, which provides data subjects with the right to request that a data controller transmit their personal data to another controller of their choice. The introduction of this data portability right makes Malaysia the fourth ASEAN jurisdiction to introduce such a right in their data protection law (after the Philippines, Singapore and Thailand).

However, this right is not absolute: it is “subject to technical feasibility and data format compatibility.” The PDPD has indicated that it regards this caveat as an exception that recognizes the practical challenges that controllers may face in transferring data between different systems.

However, this apparent exception risks undermining the right if interpreted too broadly. It should be noted that this flexibility in Malaysia’s data portability regime stands in contrast with the regime under the GDPR, which requires controllers to provide the data in a “structured, commonly used, and machine-readable format.”

To implement this new right, the PDPD has initiated consultations on proposals for subordinate regulations and a new set of guidelines. Key proposals under consideration focus on establishing technical standards, defining the scope of applicable data through “whitelists,” setting timelines for compliance, and determining rules for allowable fees.

The introduction of a data portability right into Malaysia’s PDPA carries potentially significant implications for individuals and businesses in Malaysia. For data subjects, this right enhances control over personal data in an increasingly digital environment. From a market perspective, it has the potential to foster competition and innovation by making it easier for individuals to switch service providers. While there are “success stories” of implementation of data portability rights in select sectors in jurisdictions like the United Kingdom and Australia, challenges remain in rolling out these rights across various sectors of the economy. In the APAC region, both Australia and South Korea have faced significant hurdles in this regard. 

As Malaysia embarks on implementing data portability, it may encounter challenges due to the broad scope of its data portability rights (which are at present not limited to specific sectors). This means that businesses in all industries may need to develop effective processes and technologies to manage portability requests securely – a requirement that could lead to increased costs, especially for smaller enterprises.

2.6. The amendments introduce notifiable data breach requirements to the PDPA

Though the PDPA has imposed positive security obligations on controllers since its enactment, it notably lacked requirements for controllers to notify authorities or affected individuals of data breaches. This legislative void has been addressed through the 2024 amendments and the release of the guidelines on data breach notifications (DBN Guideline) in May 2025.

The new Section 12B in the PDPA requires controllers who have reason to believe that a data breach has occurred to notify the PDPD “as soon as practicable” and in any case, within 72 hours. Written reasons must be provided if the notification is not made within the prescribed timeframe.

Additionally, if the breach is likely to result in significant harm to data subjects, controllers must also notify affected data subjects “without unnecessary delay” and no later than 7 days after the initial notification to the PDPD. Failure to comply with the new notification requirements may result in penalties of up to RM 250,000 (approximately US$53,540) and/or up to two years’ imprisonment.

The DBN Guideline clarifies that a breach is likely to result in “significant harm” when there is a risk that the compromised personal data:

Further, the DBN Guideline also states that controllers should maintain records of data breaches in both physical and electronic formats for at least two years; implement adequate data breach management and response plans; and conduct regular training for employees.

Controllers must also contractually obligate processors to promptly notify them if a data breach occurs and to provide all reasonable assistance with data breach obligations.

These requirements, which are not subject to exceptions, will significantly affect organizations processing personal data in Malaysia. Controllers in particular will need to establish effective processes for detecting, investigating, and reporting data breaches. 

Such requirements are already established in most other major ASEAN jurisdictions, including Indonesia, the Philippines, Singapore, Thailand, and Vietnam. While details vary, most jurisdictions require notifications within 72 hours of discovering a breach, with some mandating public disclosure for large-scale incidents.

The PDPA’s provisions on data breach requirements are largely similar to those in the GDPR. In fact, the PDPA’s breach notification provisions are arguably more expansive, as they do not provide an exception (as does the GDPR) for breaches unlikely to result in a risk to the rights and freedoms of natural persons.

2.7. The amendments replace the PDPA’s restrictive former whitelisting data transfers approach with a more flexible cross-border data transfer regime

Prior to the amendments, the PDPA contained a transfer mechanism permitting transfers of personal data to destinations that had been officially whitelisted by a Minister. However, this provision was never implemented, and no jurisdictions were ever whitelisted.

The amendments replaced this with a new provision allowing controllers to transfer personal data to jurisdictions with laws that: (1) are substantially similar to the PDPA; or (2) ensure an equivalent level of protection to the PDPA. This provision shifts responsibility to controllers to evaluate whether the destination jurisdiction meets the above requirements. 

In May 2025, the PDPD issued a guideline clarifying the requirements under this provision. Specifically, the controller must conduct a Transfer Impact Assessment (TIA), evaluating the destination jurisdiction’s personal data protection law against a series of prescribed factors. The TIA is valid for three years but must be reviewed if there are amendments to the destination’s personal data protection laws.

Notably, in adopting this new mechanism, Malaysia appears to have moved away from the GDPR centralized adequacy model, while maintaining other transfer mechanisms interoperable with the GDPR. The former “whitelist” mechanism more closely resembled the “adequacy” mechanism in Article 45 of the GDPR, which makes the EU Commission responsible for determining whether a jurisdiction or international organization provides an adequate level of protection and issuing a so-called “adequacy decision.” Malaysia’s new cross-border data transfer provision is more adaptable but in the absence of strong enforcement by the PDPD may potentially be open to abuse as the proposed criteria for the TIA are high-level and could easily be satisfied by any jurisdiction that has a data protection law “on the books.” 

Notably, the Guideline also introduces new guidance on other existing transfer mechanisms under the PDPA, such as the conditions for valid consent and determining when transfers are “necessary.” Additionally, the Guideline allows the use of binding corporate rules (BCRs) for intra-group transfers, standard contractual clauses (SCCs) for transfers between unrelated parties, and certifications from recognized bodies as evidence of adequate safeguards in the receiving data controller or processor. 

3. Ongoing consultations show Malaysia is preparing for future technological challenges

In March 2025, the PDPD concluded consultations on its DPIA, DPbD, and ADM guidelines. The adoption of these guidelines, though requiring organizations to take on additional responsibilities, reflects Malaysia’s interest in embracing new standards and addressing emerging technological challenges. 

3.1 Malaysia is aligning with regional peers by proposing detailed DPIA requirements

While the amended PDPA does not explicitly mandate DPIAs, the responsibility to conduct them has been introduced through the new DPO Guidelines. To clarify this obligation, the PDPD has also started consultations on a detailed DPIA framework. This move brings Malaysia closer to APAC jurisdictions like the Philippines, Singapore, and South Korea, which already provide detailed guidance on conducting DPIAs.

Under the proposals, a DPIA would be required whenever data processing is likely to result in a “high risk” to data subjects. The draft guidelines propose a two-tier approach to assess this risk, considering both quantitative factors (like the number of data subjects) and qualitative ones (such as data sensitivity). Notably, if a DPIA reveals a high overall risk, organizations may be required to notify the Commissioner of the risk(s) identified and provide other information as required. If passed in their current form, these rules would give Malaysia some of the most stringent DPIA requirements in the APAC region as no other major APAC jurisdictions impose such a proactive notification requirement on all types of controllers.

3.2 Malaysia’s proposed DPbD requirement aligns its laws closer to international standards

To further align with international standards like the GDPR, the PDPD is consulting on draft guidelines on implementing a “Data Protection by Design” (DPbD) approach. While the amended PDPA does not explicitly mandate DPbD, this proposed guideline aims to clarify how organizations can proactively embed the PDPA’s existing Personal Data Protection Principles into their operations.

The proposed approach would require integrating data protection measures throughout the entire lifecycle of a processing activity, from initial design to final decommissioning. Adopting such a guideline would mark a significant shift of Malaysia’s data protection regime from reactive to proactive data protection, helping organizations ensure more effective compliance and better protect the rights of data subjects. However, implementing and encouraging a DPbD approach goes beyond providing guidelines on DPbD. Such guidelines should be complemented by training and educational workshops for DPOs and organizations, as well as incentive schemes such as domestic trust-mark certification, to better familiarize organizations with the notion and benefits of DPbD.

3.3 Proposed guidelines anticipate the impacts of AI and machine learning

Looking ahead to the challenges posed by AI, the PDPD recently concluded a consultation on regulating ADM and profiling. Although the PDPA does not specifically touch on ADM and profiling, the PDPD’s consultation demonstrates an intent to follow in the footsteps of several other major jurisdictions, including the EU, UK, South Korea, and China, that have already implemented requirements in this area.

The Public Consultation Paper highlighted (see, for instance, para 1.2) the growing risk of AI and machine learning being used to infer sensitive information from non-sensitive data for high-impact automated decisions, such as credit scoring. To address this, the PDPD is considering issuing a dedicated ADM and Profiling (ADMP) Guideline. The ADMP Guideline would regulate ADMP if “its use results in legal effects concerning the data subject or significantly affects the data subject”, and would provide a data subject with (subject to exceptions): (a) the right to refuse to be subject to a decision based solely on ADMP which produces legal effects concerning the data subjects or significantly affects the data subject; (b) a right to information on the ADMP being undertaken; and (c) a right to request a human review of the ADMP. 

As consultation on the ADMP Guideline concluded on 19 May 2025, it will be several more months before the ADMP Guideline is expected to be finalized. Nonetheless, this presents another instance of an APAC data protection regulator acting as a de facto (albeit partial) regulator of AI-augmented decision-making. 

B. National Guidelines on AI Governance & Ethics

1. Background

In parallel with the updates to its data protection law, Malaysia has taken strides in AI governance. On 20 September 2024, the Ministry of Science, Technology, and Innovation (MOSTI) released its “National Guidelines on AI Governance & Ethics” (AI Ethics Guidelines, or Guidelines) – a comprehensive voluntary framework for the responsible development and use of AI technologies in Malaysia.

2. At its core, the Guidelines establish seven fundamental principles of AI

The Guidelines were designed for international alignment, explicitly benchmarking their seven core AI principles against a wide range of global standards. Section 4 details this comparison, referencing frameworks from the OECD, UNESCO, the EU, the US, the World Economic Forum, and Japan.

2.1. The Guidelines establish specific roles, responsibilities, and recommended actions for three key stakeholder groups in the AI ecosystem

The Guidelines assign responsibilities across the AI ecosystem. 

2.2. The Guidelines introduce consumer protection principles for AI that could be a precursor to regulatory requirements

While the AI Ethics Guidelines are voluntary and primarily aimed at encouraging stakeholders to reflect on key AI governance issues, certain provisions in the Guidelines may offer insight into how the Malaysian Government is considering potential future regulation of AI.

The Guidelines encourage businesses in Malaysia to prioritize transparency by clearly informing consumers about how AI uses their data and makes decisions. The Guidelines also encourage such businesses to provide consumers with rights concerning automated decisions, which are comparable to those in data protection laws such as the GDPR. These include the rights to information and explanation about such decisions, to object and request human intervention, and have one’s data deleted (i.e., a “right to be forgotten”). 

Part A.2.3 outlines tentative suggestions for the development of future regulations of AI (whether through existing laws or new regulations), while acknowledging that regulation of AI is at an early stage of development. The suggestions include:

Notably, several of these suggestions (such as enhancing user consent and introducing disclosure and accuracy requirements) align with similar proposals in Singapore’s Model AI Governance Framework for Generative AI and ASEAN’s generative AI guidelines, both released in 2024. 

3. Malaysia is the latest in a series of APAC jurisdictions that have released voluntary AI ethics and governance frameworks

Other APAC jurisdictions that have released voluntary AI governance guidelines in recent years include Indonesia (December 2023), Singapore (in 2019, 2020, and 2024), Hong Kong (June 2024), and Australia (October 2024).

Regionally, ASEAN has also issued regional-level guidance for organizations and national governments. These are, specifically, a “Guide on AI Ethics and Governance” (ASEAN AI Guide) in February 2024, and an expanded Guide focusing on generative AI in January 2025.

Malaysia’s AI Ethics Guidelines align with regional trends toward voluntary, principle-based AI governance, yet differ in focus and approach when compared to its neighbours and the broader ASEAN framework. To understand Malaysia’s position within ASEAN, a brief comparison is provided between Malaysia’s Guidelines and: (1) Singapore’s Model AI Governance Framework (Second Edition); (2) Indonesia’s Circular on AI Ethics (Circular), and (2) ASEAN’s AI Guide).

Table 1. Comparison of voluntary AI ethics/governance frameworks in Southeast Asia

C. Looking ahead

Malaysia’s recent developments in data protection and AI governance represent a concerted effort to build a modern and trusted digital regulatory framework. The comprehensive amendments to the PDPA bring the nation’s data protection standards into closer alignment with global benchmarks like the GDPR, while the AI Ethics Guidelines establish a foundation for responsible AI innovation nationally. Viewed together, these are not separate initiatives but two pillars of a cohesive national strategy designed to foster a trusted digital ecosystem and position Malaysia as a competitive player in the region.

For businesses operating in Malaysia, these developments have significant and immediate implications. Organizations should aim to move beyond basic compliance and adopt a strategic approach to data governance. Key actions include:

In closing, two observations may be made. First, these developments – especially the amendments to Malaysia’s PDPA – come as Malaysia sits as ASEAN’s Chair in 2025. They come as the country hopes to position itself as a mature leader in digital innovation and governance in the region, and potentially, to provide a boost just as Malaysia is hoping to conclude negotiations on the ASEAN Digital Economy Framework Agreement under its watch this year. 

Second, it should be recalled that prior to the Amendment Act, regulatory activity on data protection in Malaysia has been on a low ebb. Additionally, the PDPD has thus far not been highly active in regional and international data protection and digital regulation fora. Nevertheless, with the reconstitution of the Ministry of Communications and Multimedia into the Digital Ministry, and the re-formulation of the PDPD into an independent Commissioner’s Office (as shared by Commissioner Nazri at FPF’s Second Japan Privacy Symposium in Tokyo last year), there is an expectation that more engagement can be expected from Malaysia on data protection and AI regulation in the years to come.

Note: The information provided above should not be considered legal advice. For specific legal guidance, kindly consult a qualified lawyer practicing in Malaysia

Understanding Japan’s AI Promotion Act: An “Innovation-First” Blueprint for AI Regulation

The global landscape of artificial intelligence (AI) is being reshaped not only by rapid technological advancement but also by a worldwide push to establish new regulatory regimes. In a landmark move, on May 28, 2025, Japan’s Parliament approved the “Act on the Promotion of Research and Development and the Utilization of AI-Related Technologies” (人工知能関連技術の研究開発及び活用の推進に関する法律案要綱) (AI Promotion Act, or Act), making Japan the second major economy in the Asia-Pacific (APAC) region to enact comprehensive AI legislation. Most provisions of the Act (except Chapters 3 and 4, and Articles 3 and 4 of its Supplementary Provisions) took effect on June 4, 2025, marking a significant transition from Japan’s soft-law, guideline-based approach to AI governance to a formal legislative framework.

This blog post provides an in-depth analysis of Japan’s AI Promotion Act, its strategic objectives, and unique regulatory philosophy. It further develops on our earlier analysis of the Act (during its draft stage), available exclusively for FPF Members in our FPF Members Portal. The post begins by exploring the Act’s core provisions in detail, before placing the Act in a global context by drawing detailed comparisons between the Act and two other pioneering omnibus AI regulations: (1) the European Union (EU)’s AI Act, and South Korea’s Framework Act on AI Development and Establishment of a Foundation for Trustworthiness (AI Framework Act). This comparative analysis of these three regulations reveals three distinct models for AI governance, creating a complex compliance matrix that companies operating in the APAC region will need to navigate going forward.

Part 1: Key Provisions and Structure of the AI Promotion Act

The AI Promotion Act establishes policy drivers to make Japan the world’s “most AI-friendly country” 

The Act’s primary purpose is to establish foundational principles for policies that promote the research, development, and utilization of AI in Japan to foster socio-economic growth.

The Act implements the Japanese government’s ambition, outlined in a 2024 whitepaper, to make Japan the world’s “most AI-friendly country.” The Act is specifically designed to create an environment that encourages investment and experimentation by deliberately avoiding the imposition of stringent rules or penalties that could stifle development.

This initiative is a direct response to low rates of AI adoption and investment in Japan. A summary of the AI Promotion Act from Japan’s Cabinet office highlights that from 2023 to 2024, private AI investment in Japan was a fraction of that seen in other major markets globally (such as the United States, China, and the United Kingdom), with Stanford University’s AI Index Report 2024 putting Japan in 12th place globally for this metric. The Act is, therefore, a strategic intervention intended to reverse these trends by signaling strong government support and creating a predictable, pro-innovation legal environment.

The AI Promotion Act is structured as a “fundamental law” (基本法), establishing high-level principles and national policy direction rather than detailed, prescriptive rules for private actors.

While introducing a basis for binding AI regulation, the Act also builds on Japan’s longstanding “soft law” approach to AI governance, relying on non-binding government guidelines (such as the 2022 Governance Guidelines for the Implementation of AI Principles and 2024 AI Business Operator Guidelines), multi-stakeholder cooperation, and the promotion of voluntary business initiatives over “hard law” regulation. The Act’s architecture therefore embodies the Japanese Government’s broader philosophy of “agile governance” in digital regulation, which posits that in rapidly evolving fields like AI, rigid, ex-ante regulations are likely to quickly become obsolete and may hinder innovation. 

The AI Promotion Act adopts a broad, functional definition of “AI-related technologies.”

The primary goal of the AI Promotion Act (Article 1) is to establish the foundational principles for policies that promote the research, development, and utilization of “AI-related technologies” in Japan. This term refers to technologies that replicate human intellectual capabilities like cognition, inference, and judgment through artificial means, as well as the systems that use them. This non-technical definition appears to be designed for flexibility and longevity. Notably, the law proposes a unique approach to defining the scope of covered AI technologies and does not adopt the OECD definition of an AI system which served as inspiration for that in the EU AI Act. 

The Act provides a legal basis for five fundamental principles to guide AI governance in Japan

Under Article 3 of the Act, these principles include:

  1. Alignment: AI development and use should align with existing national frameworks, including the Basic Act on Science, Technology and Innovation (科学技術・イノベーション基本法), and the Basic Act on Forming a Digital Society (デジタル社会形成基本法).
  1. Promotion: AI should be promoted as a foundational technology for Japan’s economic and social development, with consideration for national security.
  2. Comprehensive advancement: AI promotion should be systematic and interconnected across all stages, from basic research to practical application.
  3. Transparency: Transparency in AI development and use is necessary to prevent misuse and the infringement of citizens’ rights and interests.
  4. International leadership: Japan should actively participate in and lead the formulation of international AI norms and promote international cooperation.

The AI Promotion Act adopts a whole-of-society approach to promoting AI-related technologies

Broadly, the Act assigns high-level responsibilities to five groups of stakeholders:

To fulfill its responsibilities, the National Government is mandated to take several Basic Measures, including:

The Act adopts a cooperative approach to governance and enforcement

The Act’s approach to governance and enforcement diverges significantly from overseas legislative frameworks.

The centerpiece of the new governance structure established under the Act is the establishment of a centralized AI Strategy Headquarters within Japan’s Cabinet. Chaired by the Prime Minister and including all other Cabinet ministers as members, this body ensures a whole-of-government, coordinated approach to AI policy.

The AI Strategy Headquarters’ primary mandate is to formulate and drive the implementation of a comprehensive national Basic AI Plan, which will provide more substantive details on the government’s AI strategy.

The AI Promotion Act contains no explicit penalties, financial or otherwise, for non-compliance with its requirements or, more broadly, for misusing AI. Instead, its enforcement power rests on a unique cooperative and reputational model.

Part 2: A Tale of Three AI Laws – Comparative Analysis of Japan’s AI Promotion Act, the EU’s AI Act, and South Korea’s AI Framework Act

To fully appreciate Japan’s approach, it is useful to compare it with the other two prominent global AI hard law frameworks, the EU AI Act and South Korea’s Framework AI Act.

The EU AI Act is a comprehensive legal framework for AI systems. Officially published on July 12, 2024, it became effective on August 2, 2024, but it is becoming applicable in multiple stages, beginning in January 2025 and trickling down until 2030. Its primary aim is to regulate AI systems placed on the EU market, balancing innovation with ethical considerations and safety. The Act proposes a risk-based approach whereby a few uses of AI systems are prohibited as they are considered to have unacceptable risk to health, safety and fundamental rights; some AI systems are considered “high-risk” and bear most of compliance obligations for their deployers and providers; while others are either low risk, facing mainly transparency obligations, or they are simply outside of the scope of the regulation. The AI Act also has a separate set of rules applying only to General Purpose AI models, with enhanced obligations linked to those that have “systemic risk.” See here for a Primer on the EU AI Act.

South Korea‘s “Framework Act on Artificial Intelligence Development and Establishment of a Foundation for Trustworthiness” (인공지능 발전과 신뢰 기반 조성 등에 관한 기본법), also known as the “AI Framework Act,” was passed on December 26, 2024, and is currently scheduled to take effect on January 22, 2026. 

The stated purpose of the AI Framework Act is to protect citizens’ rights and dignity, improve quality of life, and strengthen national competitiveness. The Act aims to promote the AI industry and technology while simultaneously preventing associated risks, reflecting a balancing act between innovation and regulation. For a more detailed analysis of South Korea’s AI Framework Act, you may read FPF’s earlier blog post here

Like the EU’s AI Act, South Korea’s AI Framework Act adopts a risk-based approach, introducing specific obligations for “high-impact” AI systems utilized in critical sectors such as healthcare, energy, and public services. However, a key difference between the two laws is that South Korea does not include any prohibition of practices or AI systems. It also includes specific provisions for generative AI. Notably, AI systems used solely for national defense or security are expressly excluded from its scope, and most AI systems not classified as “high-impact” are not subject to regulation under the AI Framework Act.

AI Business Operators, encompassing both developers and deployers, are subject to several specific obligations. These include establishing and operating a risk management plan, providing explanations for AI-generated results (within technical limits), implementing user protection measures, and ensuring human oversight for high-impact AI systems. For generative AI, providers are specifically required to notify users that they are interacting with an AI system.

The AI Framework Act establishes a comprehensive governance framework, including a National AI Committee chaired by the President of the country tasked with deliberating on policy, investment, infrastructure, and regulations. The AI Framework Act also establishes other governance institutions, such as the AI Policy Center and AI Safety Research Institute. The Ministry of Science and ICT (MSIT) holds the responsibility for establishing and implementing a Basic AI Plan every three years. The MSIT is also granted significant investigative and enforcement powers, with enforcement measures including corrective orders and fines. The AI Framework Act also includes extraterritorial provisions, extending its reach beyond South Korea.

Commonalities and divergences across jurisdictions

The regulatory philosophies across Japan, South Korea, and the EU present a spectrum of approaches.

Differences are also evident in scope, risk classification, and enforcement severity. Japan’s AI Promotion Act and South Korea’s AI Framework Act are both foundational laws that allocate responsibilities for AI governance within the government and establish a legal basis for future regulation of AI. However, Japan’s AI Promotion Act does not impose any direct obligations on private actors and does not include a “risk” or “high-impact” classification of AI technologies. By contrast, South Korea’s AI Framework Act imposes a range of obligations on “high-impact” and generative AI, without going so far as to prohibit AI practices. The latter also has specific carve-outs for national defense, similar to how the EU AI Act excludes AI systems for military and national security purposes from its scope.

The EU AI Act has the broadest and most detailed scope, categorizing all AI systems into four risk levels, with strict requirements for high-risk and outright prohibitions for unacceptable risk systems, in addition to specific obligations for General Purpose AI (GPAI) models.

In terms of enforcement powers, Japan’s AI Promotion Law notably lacks any penalties for noncompliance or misuse of AI more broadly. South Korea’s AI Framework Act, by contrast, has enforcement powers, including fines and corrective orders, but its financial penalties are comparatively lower than those in the EU’s AI Act. For instance, the maximum fine under South Korea’s AI Framework Act is set at KRW 30 million (approximately USD 21,000), whereas, under the EU AI Act, fines can range from EUR 7.5 million to EUR 35 million (approximately. USD 7.8 million to USD 36.5 million), or 1% to 7% of the company’s global turnover.

Despite these divergences, there are some commonalities. All three laws establish central governmental bodies (Japan’s AI Strategy Headquarters, South Korea’s National AI Committee, and the EU’s AI Office/NCAs) to coordinate AI policy and strategy. All three also emphasize international cooperation and participation in norm-setting. Notably, all three frameworks explicitly or implicitly reference the core tenets of transparency, fairness, accountability, safety, and human-centricity, which have been developed in international forums like the OECD and the G7 Hiroshima AI Process.

The divergence is not in the “what” – ensuring the responsible development and deployment of AI – but in the “how.” The EU chooses comprehensive, prescriptive regulation; Japan opts for softer regulation building on existing voluntary guidelines; and South Korea applies targeted regulation to specific high-risk areas. This indicates a global consensus on the desired ethical outcomes for AI, but a deep and consequential divergence on the most effective legal and administrative tools to achieve them.

Access here a detailed Comparative Table of the three AI laws in the EU, South Korea and Japan, comparing them on 11 criteria, from definitions and scope, to risk categorization, enforcement model and support for innovation.

The future of AI regulation: A new regional and global landscape

The distinctly “light-touch” approach to AI regulation in Japan suggests a minimal compliance burden for organizations in the immediate term. However, the AI Promotion Act is arguably the beginning, not the end, of the conversation, as the forthcoming Basic AI Plan has the potential to introduce a wide range of possible initiatives.

Regionally, Japan’s “innovation-first” strategy likely aims to draw investment by offering a less burdensome regulatory environment. The EU, conversely, is attempting to set a high standard for ethical and safe AI, aiming to foster sustainable and trustworthy innovation. South Korea’s middle-ground approach attempts to capture benefits from both strategies.

The availability of a full spectrum of regulatory models on a global scale aimed at the same technology could lead to regulatory arbitrage. It remains to be seen whether companies prioritize development in less regulated jurisdictions to minimize compliance costs, or, conversely, whether there will be a global demand for “EU-compliant” AI as a mark of trustworthiness. This dynamic implies that the future of AI development might be shaped not just by technological breakthroughs but by the attractiveness of regulatory environments as well.

Nevertheless, it is also worth noting that a jurisdiction’s regulatory model alone does not determine its ultimate success in attracting investments or deploying AI effectively. Many other factors, such as the availability of data, compute and talent, as well as the ease of doing business generally, will also be critical.

With two significant jurisdictions in the APAC region having adopted now innovation-oriented AI laws, it appears that the region is starting a trend in innovation-first AI regulation and a contrasting model to the EU AI Act. At the same time, it is notable that both Japan and South Korea have comprehensive national data protection laws, which offer safeguards to people’s rights in all contexts where personal data is being processed, including through AI systems.

Note: Please note that the summary of the AI Promotion Act below is based on an English machine translation, which may contain inaccuracies. Additionally, the information should not be considered legal advice. For specific legal guidance, kindly consult a qualified lawyer practicing in Japan.

The author acknowledges the valuable contributions of the APAC team’s interns, Darren Ang and James Jerin Akash, in assisting with the initial draft of this blog post.

The Connecticut Data Privacy Act Gets an Overhaul (Again)

Co-Authored by Gia Kim, FPF U.S. Policy Intern

On June 25, Governor Ned Lamont signed SB 1295, amending the Connecticut Data Privacy Act (CTDPA). True to its namesake as the “Land of Steady Habits,” Connecticut is developing the habit of amending the CTDPA. Connecticut has long been ahead of the curve, especially when it comes to privacy. In 1788, Connecticut became the fifth state to ratify the U.S. Constitution. In 2022, it similarly became the fifth state to enact a comprehensive consumer privacy law. In 2023, it returned to that law to add heightened privacy protections for minors and for consumer health data. In 2024 and 2025, the Attorney General issued enforcement reports that included recommendations for changes to the law (some of which were ultimately included in SB 1295). Now, a mere two years since the last major amendments, Connecticut has once again passed an overhaul of the CTDPA. 

This fresh bundle of amendments makes myriad changes to the law, expanding its scope, adding a new consumer right, heightening the already strong protections for minors, and more. Important changes include: 

  1. Significantly expanded scope, through changes to applicability thresholds, narrowed exemptions, and expanded definitions; 
  2. Changes to consumer rights, including modifying the right to access one’s personal data and a new right to contest certain profiling decisions; 
  3. Modest changes to data minimization, purpose limitation, and consent requirements; 
  4. New impact assessment requirements headline changes to profiling requirements; and 
  5. Protections for minors, including a ban on targeted advertising.

These changes will be effective July 1, 2026 unless stated otherwise.

1.  The Law’s Scope Is Expanded Through Changes to Applicability Thresholds, Narrowed Exemptions, and Expanded Definitions

A.  Expanded Applicability

Some of the most significant changes these amendments make to the CTDPA are the adjustments to the law’s applicability thresholds, likely bringing many more businesses in scope of the law. Prior to SB 1295, controllers doing business in Connecticut were subject to the CTDPA if they controlled or processed the personal data of (1) at least 100K consumers (excluding personal data controlled or processed solely for completing a payment transaction), or (2) at least 25K consumers if they also derived more than 25% of their gross revenue from the sale of personal data. The figures in those thresholds were already common to the state comprehensive privacy laws when the CTDPA was enacted in 2022, and those same thresholds have been included in numerous additional state privacy laws enacted after the CTDPA. In recent years, however, several new privacy laws have opted for lower thresholds. SB 1295 continues that trend and goes further. 

Under the revised applicability thresholds, the CTDPA will apply to entities that (1) control or process the personal data of at least 35K consumers, (2) control or process consumers’ sensitive data (excluding personal data controlled or processed solely for completing a payment transaction), or (3) offer consumers’ personal data for sale in trade or commerce. Although the lowered affected consumer threshold aligns with other states such as Delaware, New Hampshire, Maryland, and Rhode Island, the other two applicability thresholds are unique and more expansive. Given the broad definition of “sensitive data,” expanding the law’s reach to any entity that processes any sensitive data is significant as it likely implicates a vast array of businesses that were not previously in scope. Similarly, expanding the law’s reach to any entity that offers personal data for sale may implicate a wide swath of small businesses engaged in targeted advertising, given the broad definition of “sale” which includes the exchange of personal data for monetary or other valuable consideration. 

In addition to the changes to the applicability thresholds, these amendments also adjust some of the law’s exemptions. Most notably, SB 1295 replaces the entity-level Gramm-Leach-Bliley Act (GLBA) exemption with a data-level exemption. This follows an emerging trend in favor of a data-level GLBA exemption, and it was one of the requested legislative changes in the Connecticut Attorney General’s 2024 and 2025 reports on CTDPA enforcement. As the GLBA entity-level exemption is removed, that change is counterbalanced by new entity-level exemptions for some other financial institutions such as insurers, banks, and certain investment agents as defined under various federal and state laws. Shifting away from the GLBA entity-level exemption is responsive to concerns that organizations like payday lenders and car dealerships were avoiding applicability under state privacy laws, which was not lawmakers’ intent. 

B.  New and Modified Definitions

Expanding the law’s applicability to any entity that processes sensitive data is compounded by the changes SB 1295 makes to the definition of sensitive data, which now includes mental or physical health “disability or treatment” (in addition to “condition” or “diagnosis”), status as nonbinary or transgender (like in Oregon, Delaware, New Jersey, and Maryland), information derived from genetic or biometric data, “neural data” (defined differently than in California or Colorado), financial information (focusing largely on account numbers, log-ins, card numbers, or relevant passwords or credentials giving access to a financial account), and government-issued identification numbers. 

Another minor scope change in SB 1295 is the new definition of “publicly available information,” which now aligns with the California Consumer Privacy Act (CCPA) by excluding biometric data that was collected without the consumer’s consent. 

2.  Changes to Consumer Rights, Including Modifying the Right to Access One’s Personal Data and a New Right to Contest Certain Profiling Decisions

A.  Access

Drawing from developments in other states, SB 1295 makes several changes to the law’s consumer rights. First, SB 1295 expands the right to access one’s personal data to include (1) inferences about the consumer derived from personal data and (2) whether a consumer’s personal data is being processed for profiling to make a decision that produces any legal or similarly significant effect concerning the consumer. This is consistent with requirements under the Colorado Privacy Act regulations (Rule 4.04), which specify that compliance with an access request must include “include final [p]rofiling decisions, inferences, derivative data, marketing profiles, and other [p]ersonal [d]ata created by the [c]ontroller which is linked or reasonably linkable to an identified or identifiable individual.” The CCPA similarly specifies that personal information includes inferences derived from personal information to create a profile about a consumer, bringing such information within the scope of access requests. 

Since 2023, new privacy laws in Oregon, Delaware, Maryland, and Minnesota have included a consumer right to know either the specific third parties or the categories of third parties to whom the consumer’s personal data are disclosed. Continuing that trend, SB 1295 adds a right to access a list of the third parties to whom a controller sold a consumer’s personal data, or, if that information is not available, a list of all third parties to whom a controller sold personal data. While this closely resembles the provisions in the Oregon Consumer Privacy Act and the Minnesota Consumer Data Privacy Act, SB 1295 differs from those laws in a few minute ways. First, SB 1295 concerns the third parties to whom personal data was sold, as opposed to the third parties to whom personal data was disclosed. This difference may not be consequential if the amount of third parties to whom personal data are disclosed but not “sold” (given the broad definition of “sell”) is near zero. Furthermore, unlike in Oregon’s law where the option to provide a non-personalized list of third party recipients is at the controller’s discretion, SB 1295 only allows controllers to provide the broader, non-personalized list if the controller does not maintain a list of the third parties to whom it sold the consumer’s personal data.

While the above changes expand the right to access, SB 1295 also narrows the right to access by prohibiting disclosure of certain types of personal data. Under the amendments, a controller cannot disclose the following types of data in response to a consumer access request: social security number; government-issued identification number (including driver’s license number); financial account number; health insurance or medical identification number; account password, security question or answer; and biometric data. Instead, the CTDPA now requires a controller to inform the consumer “with sufficient particularity” that the controller collected these types of personal data. Minnesota became the first state to include this requirement in its comprehensive privacy law in 2024, and Montana amended its privacy law earlier this year to include a similar requirement. This change is likely an attempt to balance a consumer’s right to access their personal data with the security risk of erroneously exposing sensitive information such as SSNs to third parties or bad actors. 

B.  Profiling

In addition to the changes to the access right, SB 1295 makes important amendments to profiling rights. The existing right to opt-out of profiling in furtherance of decisions that produce legal or similarly significant effects is expanded. Previously it was limited to “solely automated decisions,” whereas now the right applies to “any automated decision” that produces legal or similarly significant effects. Similarly, the reworked definition of “decision that produces any legal or similarly significant effect” now includes any decision made “on behalf of the controller,” not just decisions made by the controller. This likely expands the scope of profiling protections to intermediate and non-final decisions. 

SB 1295 also adds a new right to contest profiling decisions, becoming the second state to do so after Minnesota. Under this new right, if a controller is processing personal data for profiling in furtherance of any automated decision that produced any legal or similarly significant effects concerning the consumer, and if feasible, the consumer will have the right to: 

These requirements diverge from Minnesota’s approach in a few ways. First, Connecticut’s right only applies “if feasible,” which arguably removes any implicit incentive to design automated decisions based on profiling to accommodate such rights. For example, Minnesota’s law does not have this caveat, so controllers will have to design their profiling practices to be explainable. Although this differs from Minnesota’s right, it is not wholly new language. Rather, Connecticut’s “if feasible” qualifier mirrors language in the right to appeal an adverse consequential decision under Colorado’s 2024 law regulating high-risk artificial intelligence systems (allowing for human review of adverse decisions “if technically feasible”). Second, the right to correct inaccurate personal data and have the profiling decision reevaluated is limited to decisions concerning housing. Third, SB 1295 does not include the right to be informed of actions that the consumer could have taken, and can take in the future, “to secure a different decision.” 

3.  Modest Changes to Controller Duties, Including Data Minimization, Purpose Limitation, and Sensitive Data Consent Requirements

Data minimization has become a hotly contested policy issue in privacy legislation in recent years, as states explore more “substantive” requirements that tie the collection, processing, and/or sharing of personal (or sensitive) data to what is “necessary” to provide a requested product or service. At various points this year, Connecticut, Colorado, and Oregon all considered amending their existing privacy laws to include Maryland-style substantive data minimization requirements. None of these states ended up following that path, although Connecticut did rework the data minimization, purpose limitation, and consent requirements in the CTDPA. 

screenshot 2025 06 30 at 9.21.10 am

 

connecticut chart

It is not immediately clear whether these changes are more than trivial, at least with respect to data minimization and the sensitive data requirements. Changing the limit on collecting personal data from what is “adequate, relevant, and reasonably necessary” for a disclosed purpose to what is “reasonably necessary and proportionate” for a disclosed purpose may not be operationally significant. “Proportionality” is a legal term of art that is beyond the scope of this blog post. It is sufficient to say that it is doubtful that in this context “proportionate” means much more than to limit collection to what is adequate and relevant, which was the original language. Similarly, for sensitive data, controllers now have the added requirement to limit their processing to what is “reasonably necessary in relation to the purposes for which such sensitive data are processed,” in addition to getting consent for processing. This change may be trivial at best and circular at worst, depending on whether one believes that it is even possible to process data for a purpose that is not reasonably necessary to the purpose for which the data are being processed. Similarly, the law now specifies that controllers must obtain separate consent to sell sensitive data. This change is likely intended to prevent controllers from bundling requests to sell sensitive data with other consent requests for processing activities that are essential for the functionality of a product or service.

The changes are more significant with respect to purpose limitation. The core aspects of the rule remain unchanged—obtain consent for secondary uses of personal data (subject to various exceptions in the law, such as bias testing for automated decisionmaking). New in SB 1295 is (1) a new term of art (a “material new purpose”) to describe secondary uses that are not reasonably necessary to or compatible with the purposes previously disclosed to the consumer, and (2) factors to determine when a secondary use is a “material new purpose.” These factors include the consumer’s reasonable expectations at the time of collection, the link between new purpose and the original purpose, potential impacts on the consumer, the consumer-controller relationship and the context of collection, and potential safeguards. These factors are inspired by, but not identical to, those in Rule 6.08 of the Colorado Privacy Act regulations and § 7002 of the CCPA regulations, which were themselves inspired by the General Data Protection Regulation’s factors for assessing the compatibility of secondary uses in Art. 6(4).  

There are other minor changes to controller duties, including a new requirement for controllers to disclose whether they collect, use, or sell personal data for the purpose of training large language models (LLMs). 

4.  New Impact Assessment Requirements Headline Changes to Profiling Requirements

SB 1295 expands and builds upon many of the CTDPA’s existing protections and business obligations with respect to profiling and automated decisions, affecting consumer rights, transparency obligations, exceptions to the law, and privacy by design and accountability practices. As discussed above, SB 1295—

Another significant update with respect to profiling is the addition of new impact assessment requirements. Like the majority of state comprehensive privacy laws, the CTDPA already requires controllers to conduct data protection assessments for processing activities that present a heightened risk of harm, which includes profiling that presents a reasonably foreseeable risk of substantial injury (e.g., financial, physical or reputational injury). SB 1295 adds a new “impact assessment” requirement for controllers engaged in profiling for the purposes of making a decision that produces any legal or similarly significant effect concerning a consumer. An impact assessment has to include, “to the extent reasonably known by or available to the controller,” the following:

  1. A statement disclosing the profiling’s “purpose, intended use cases and deployment context of, and benefits afforded by,” the profiling; 
  2. Analysis as to whether the profiling poses any “known or reasonably foreseeable heightened risk of harm to a consumer”; 
  3. A description of the main categories of personal data processed as inputs for the profiling and the outputs the profiling produces; 
  4. An overview of the “main categories” of personal data used to “customize” the profiling, if any; 
  5. Any metrics used to evaluate the performance and known limitations of the profiling; 
  6. A description of any transparency measures taken, including measures taken to disclose to the consumer that the profiling is occurring while it is occurring; and 
  7. A description of post-deployment monitoring and user safeguards provided (e.g., oversight, use, and learning processes).

These requirements are largely consistent with similar requirements under Colorado’s 2024 law regulating high-risk artificial intelligence systems. Impact assessments will be required for processing activities created or generated on or after August 1, 2026, and they will not be retroactive. 

These new provisions raise several questions. First, it is unclear whether an obligation to include information that is “reasonably known by or available to the controller” implies an affirmative duty for a controller to seek out facts and information that may not be known already but which could be identified through additional testing. Second, it is not clear when and how impact assessments should be bundled with data protection assessments, to the extent that they overlap. The law provides that a single data protection assessment or impact assessment can address a comparable set of processing operations that include similar activities. This could be read either as saying that one assessment total can cover a set of similar activities, or that one data protection assessment or impact assessment can be conducted to cover a set of similar activities but an activity (or set of activities) subject to both requirements must receive two assessments.  

Impact assessments will be relevant to enforcement. Like with data protection assessments, the AG can require a controller to disclose any impact assessment relevant to an investigation. In an enforcement action concerning the law’s prohibition on processing personal data in violation of state and federal antidiscrimination laws, evidence or lack of evidence regarding a controller’s proactive bias testing or other similar proactive efforts may be relevant. 

With respect to minors, there are additional steps and disclosures that must be made. If a controller conducts a data protection assessment or impact assessment and determines that there is a heightened risk of harm to minors, the controller is required to “establish and implement a plan to mitigate or eliminate such risk.” The AG can require the controller to disclose a harm mitigation or elimination plan if the plan is relevant to an investigation conducted by the AG. These “harm mitigation or elimination plans” shall be treated as confidential and exempt from FOIA disclosure in the same manner as data protection assessments and impact assessments. 

5.  Protections for Minors, Including a Ban on Targeted Advertising

The last major update to the CTDPA in 2023 added heightened protections for minors, including certain processing and design restrictions and a duty for controllers to use reasonable care to avoid “any heightened risk of harm to minors” caused by their service. Colorado and Montana followed Connecticut’s lead and added similar protections to their comprehensive privacy laws in recent years. SB 1295 now adjusts those protections for minors again and makes them stricter. 

Under the revised provisions, a controller is entitled to a rebuttable presumption of having used reasonable care if they comply with the data protection assessment and impact assessment requirements under the law. More significant changes have been made to the processing restrictions. Previously, the law imposed several substantive restrictions (e.g., limits on targeted advertising or the sale of personal data) for minors, but allowed a controller to proceed with those activities if they obtained opt-in consent. As noted in FPF’s analysis of the 2023 CTDPA amendments, it is atypical for a privacy law to allow for consent as an alternative to certain baseline protections such as data minimization and retention limits. In narrowing the role of consent with respect to minors, SB 1295 imposes strongline baselines and privacy by design requirements with respect to children and teens: 

The bans on targeted advertising and selling personal data of minors align with Maryland, and a recently enacted amendment to the Oregon Consumer Privacy Act banning the sale of personal data of consumers under the age of 16. 

Consent is not entirely excised. The revised law still allows controllers to obtain opt-in consent to process minors’ personal data for purposes of profiling in furtherance of any automated decision made by the controller that produces legal or similarly significant effect concerning the provision or denial of certain enumerated essential goods and services (e.g., education enrollment or opportunity). Allowing minors to opt-in to such profiling may open up opportunities that would otherwise be foreclosed, especially in areas like employment, financial services, and educational enrollment which older teenagers are likely encountering for the first time as they approach adulthood. For example, some career or scholarship quizzes may rely on profiling to tailor opportunities to a teen’s interests. 

* * * 

Looking to get up to speed on the existing state comprehensive consumer privacy laws? Check out FPF’s 2024 report, Anatomy of State Comprehensive Privacy Law: Surveying the State Privacy Law Landscape and Recent Legislative Trends

Meet Bianca-Ioana Marcu, FPF Europe Managing Director

FPF is pleased to welcome our colleague Bianca-Ioana Marcu to her new role as Managing Director of FPF Europe. With extensive experience in privacy and data protection, she takes on this responsibility at a pivotal moment for digital regulation in Europe. In this blog, we will explore her perspectives on the evolving privacy landscape, her approach to advancing discussions on data protection in Europe and Africa, and her vision for strengthening FPF’s leadership in addressing emerging challenges. Her insights will be key in navigating the complex intersection of privacy, innovation, and regulatory development in the years ahead.

You’ve been part of FPF for some time now, but this new role brings fresh responsibilities. What are you most excited to lead as Managing Director of the European office, and how do you see your work promoting the privacy dialogue in the region?

Stepping into this new role at FPF has given me a renewed sense of energy and opportunity that I hope to bring to the brilliant team on the ground. We are at a crossroads in Europe where existential questions are being asked with regard to the effectiveness and malleability of the existing digital regulatory framework. The privacy question is and will remain essential in this ongoing dialogue, as the GDPR is recognized as both the foundation and the cornerstone of the broader EU digital rulebook.

Within the FPF Europe office we will continue to contribute actively to this dialogue, acting as a source of expert, practical, and measured analysis and ideas for identifying ways in which respect for fundamental rights can coexist alongside technological development.

bianca's photo for blog

As you step into the role of Managing Director, you will also continue coordinating FPF’s growing presence in Africa. What are your top three priorities for the coming year?  

With the expert knowledge and support of our Policy Manager for Africa, Mercy King’ori, this year we successfully launched FPF’s Africa Council. The basis for our work in the region is to advance data protection through collaboration, innovation, and regional expertise, focusing on thought leadership and regionally grounded research. We were delighted to be an official partner of the Network of African Data Protection Authorities (NADPA) Conference hosted in Abuja, Nigeria, with an event on securing safe and trustworthy cross-border data flows.

Over the next years, FPF Africa will sustain its support for data practices that drive innovation, protect privacy, and uphold fundamental rights while being rooted in the diverse legal, social, and economic contexts of the continent.

FPF is known as a trusted platform where senior leaders come to test ideas, share solutions, and learn from one another. As Managing Director, how do you plan to strengthen these connections further while supporting members navigating emerging challenges?

Now in my third year of bringing to life FPF’s flagship event in Europe – the Brussels Privacy Symposium – I am continually inspired by the openness and commitment of the senior leaders in our community in ensuring strong data protection practices globally.

Our dedication to delivering high-quality legal research and policy analysis to our members remains strong, as well as opportunities to come together with intellectual curiosity.

Innovation and data protection are often seen at odds. In your view, what are the most promising opportunities for advancing privacy and innovation in the EU?

As the regulatory dialogue in Europe evolves, there is certainly an opportunity for advancing privacy protection as well as for supporting the region’s ambitions for economic growth. The current momentum for European legislators to streamline the EU’s digital rulebook brings promising opportunities for gathering all stakeholders around the same table, with a focus on clarifying legal uncertainties or points of tension between the rulebook’s different elements, and with an eye on the type of future we want to co-design. 

On a more personal note, what inspires your commitment to privacy, and how has your perspective evolved through your work at FPF and beyond?

My commitment to privacy is fueled not only by the belief that the fulfillment of this right is conducive to the enjoyment of other fundamental rights, including non-discrimination, but also by the support and dedication I have found within a privacy community that extends far beyond Brussels. My work at FPF, particularly on Gabriela Zanfir-Fortuna’s brilliant Global Privacy team, has exposed me to the rich and diverse practices and understandings of privacy and data protection around the world. My ambition is to bring this valuable global perspective to FPF Europe’s work, finding ways for continued cooperation and alignment rather than distance and isolationism. 

Annual DC Privacy Forum: Convening Top Voices in Governance in the Digital Age

event recap blog template (6)

FPF hosted its second annual DC Privacy Forum: Governance for Digital Leadership and Innovation on Wednesday, June 11. Staying true to the theme, this year’s forum convened key government, civil society, academic, and corporate privacy leaders for a day of critical discussions on privacy and AI policy. Gathering an audience of over 250 leaders from industry, academia, civil society and government, the forum featured keynote panels and debates on global data governance, youth online safety, cybersecurity, AI regulation, and other emerging digital governance challenges.

Cross-Sector Collaboration in Digital Governance

FPF CEO Jules Polonetsky began the day by delivering opening remarks emphasizing the importance of cross-sector collaboration among senior leaders in privacy, AI, and digital governance. His message was clear: supporting valuable, societal uses of data requires voices from across industries and sectors working together.

fpf meeting jun 2025.a 12 (1)

After welcoming the audience, Polonetsky turned to the opening panel “The Path to U.S. Privacy Legislation: Is Data Protection Law the Real AI Regulator?” featuring Dr. Gabriela Zanfir-Fortuna, FPF’s Vice President of Global Privacy, and Keir Lamont, FPF’s Senior Director for U.S. Legislation, Meredith Halama, Partner at Perkins Coie, and Paul Lekas, Senior Vice President and Head of Global Public Policy and Government Affairs at the Software Information Industry Association (SIIA). The discussion explored how existing data protection laws function as de facto AI regulators, highlighting renewed bipartisan efforts toward federal U.S. privacy legislation, navigating persistent challenges like preemption and private rights of action, and how the evolving global landscape shapes U.S. approaches.

Global Leadership in Data Flows and AI

fpf meeting jun 2025.b 9 (3)

Continuing the conversation about the U.S.’s approach to regulating global data flows, Ambassador Steve Lang, U.S. Coordinator for International Communications and Information Policy at the U.S. Department of State, provided the opening remarks for the next panel, “Advancing U.S. Leadership on Global Data Flows and AI.” In his speech, Ambassador Lang emphasized the importance of cross-border data flows, arguing that trust depends on protecting data wherever it moves.

fpf meeting jun 2025.c 5 (2)

From there, Morning Tech Reporter at Politico, Gabby Miller, moderated an insightful discussion between Kat Duffy, Senior Fellow for Digital & Cyberspace Policy at the Council on Foreign Relations, Maryam Mujica, Chief Public Policy Officer at General Catalyst, and Pablo Chavez, Adjunct Senior Fellow, Technology and National Security Program at the Center for a New American Security (CNAS). Focusing specifically on how the United States’ role in global data flows and AI has shifted under the new administration, the panel examined how different strategies in digital governance between past and present administrations have had varied impacts on innovation.

The State of AI Legislation: Federal vs. State Approaches

fpf meeting jun 2025.e 8 (1)

Following a coffee break, FPF Director for U.S. AI Legislation, Tatiana Rice, moderated “AI Legislation – What Role for the States,” with participants Dr. Laura Caroli, Senior Fellow, Wadhwani AI Center, at the Center for Strategic and International Studies (CSIS), Travis Hall, State Director at the Center for Democracy & Technology, Jim Harper, Nonresident Senior Fellow at the American Enterprise Institute, and Shaundra Watson, Senior Director, Policy at the Business Software Alliance. The panelists explored states’ differing roles in regulating AI, from acting as a laboratory of democracy, as Wall argued, to upholding constitutional separation of powers between federal and state law, as Harper noted. The panelists agreed that transparency and accountability remain top of mind for businesses and regulators alike.

Diving Deep into AI Agents: Opportunities and Challenges

fpf meeting jun 2025.f 13

Staying on the topic of AI, the next panel, moderated by Bret Cohen, Partner at Hogan Lovells Privacy and Cybersecurity Practice, unpacked the subject of AI agents. The panel featured industry experts including Jarden Bomberg, U.S. Policy Lead for Privacy and Data Strategy at Google, Leigh Feldman, Senior Vice President and Chief Privacy Officer at Visa, Lindsey Finch, Executive Vice President of Global Privacy and Product Legal at Salesforce, and Pamela Snively, Chief Data and Trust Officer at TELUS Communications.

The conversation began by discussing the immense opportunities that agentic AI will make possible before moving into a more nuanced discussion about the privacy, governance, and policy considerations developers must address. The panelists agreed that risk management remains a top priority when developing agentic AI at their organizations. However, as Snively noted, the rewards will likely outweigh the risks.

Competition Meets Privacy in the AI Era

fpf meeting jun 2025.h 5

After a networking lunch,  attendees retook seats for the event’s second half. Moderator Dr. Gabriela Zanfir-Fortuna, FPF’s Vice President for Global Privacy, welcomed back everyone for “Competition/Data Protection in an AI World.” Joined by Maureen Ohlhausen, Partner at Wilson Sonsini and Peter Swire, FPF Senior Fellow, and J.Z. Liang Chair at the Georgia Institute of Technology, this panel asked discussants to consider the key intersection between privacy and competition in the age of AI, focusing specifically on how regulators can empower users to protect privacy and ensure fair competition.

The discussion highlighted a key regulatory challenge –while antitrust policy often favors openness, this approach can create privacy and security risks. Swire argued that regulators must find ways to make privacy enforcement a dimension of market competition. Ohlhausen then noted that sometimes privacy protection laws can unintentionally affect competition. AI, she added, is like the “pumpkin spice of privacy,” referring to the trend of inserting AI into privacy conversations even where it might not directly apply.

The Big Debates: Experts Go Head to Head

The energy in the room lifted as FPF’s Senior Director for U.S. Legislation, Keir Lamont, revved up the crowd for “The Big Debates.” This event’s debate-style format allowed the audience to participate via real-time voting before, during, and after the debaters’ presentations. 

Debate 1: “Current U.S. Law Provides Effective Regulation for AI” 

Will Rinehart, Senior Fellow at the American Enterprise Institute, argued in favor of the statement, stating that existing U.S. law comprises adaptable legal frameworks, sector-specific expertise, and enforcement grounded in legal principles. He argued that the U.S. needs better enforcement complemented by additional resources for enforcers instead of creating a more robust law.

Leah Frazier, Director of the Digital Justice Initiative at Lawyers’ Committee for Civil Rights Under Law, disagreed, arguing that current U.S. law does not address various risks that AI poses, including privacy, security, and surveillance risks associated with collecting massive amounts of data used to train AI models. 

fpf meeting jun 2025.j 1 (1)

The audience strongly opposed the general premise in the initial vote, but the debate’s winner was determined based on the percentage of votes each debater lost or gained throughout the discussion. Rinehart emerged victorious, increasing support for the premise from 25% to 34% of the audience votes. 

Debate 2: “Sensitive Data Can and Should Be Strictly Regulated” 

Paul Ohm, a Professor of Law at Georgetown University Law Center, supported the statement, arguing that building laws around sensitive data reflects societal values and civil rights. Ohm continued, stating that U.S. law should target specific data categories previously unprotected for more inclusive and effective policymaking and to best protect marginalized groups. 

Mike Hintze, Partner at Hintze Law PLLC, was charged with arguing the negative, highlighting that the effectiveness of laws focused on sensitive data is particularly flawed due to problems around definition and scope. What data is considered sensitive is context-dependent, making regulation over-inclusive for some and under-inclusive for others. 

fpf meeting jun 2025.j 11 (1)

Again, the audience was in strong support of the general resolution, but Hintze won decisively, advancing the vote from 22% to 39% in support and earning an FPF Goat trophy.

Protecting Youth in Digital Spaces and Balancing Privacy and Cybersecurity

fpf meeting jun 2025.k 2

After refueling at another quick coffee break, audience members returned to the Waterside Ballroom for two final panels. 

Moderated by Bailey Sanchez, FPF’s Deputy Director for U.S. Legislation, the “Youth Privacy, Security, and Safety Online Panel” invited key industry professionals in online youth entertainment to discuss the key protections being advanced worldwide to protect children and teens online.

Panel members included Stacy Feuer, Senior Vice President, Privacy Certified at The Entertainment Software Rating Board (ESRB),  David Lieber, Head of Privacy Public Policy for the Americas at TikTok, Tyler Park, Privacy Counsel at Roblox, Nick Rossi, Director of Federal Government Affairs at Apple and Kate Sheerin, Head of Americas Public Policy at Discord. The discussion centered on the importance of built-in privacy defaults and age-appropriate design experiences. The panelists agreed that the future of protecting kids and teens online requires shared responsibility, flexible approaches, ongoing innovation, and collaboration between industry, policymakers, and youth themselves.

fpf meeting jun 2025.l 9

The day’s final panel, “Privacy/Cyber Security,” focused on the key points of conflict between online privacy and security values in regulations and at organizations. Moderated by Jocelyn Aqua, Data, Privacy & Ethics Leader at PwC, this discussion featured panelists occupying professional positions in the intersection of cybersecurity and privacy, including Emily Hancock, Vice President and Chief Privacy Officer at Cloudflare, Stephenie Gosnell Handler, Partner at Gibson, Dunn & Crutcher LLP, and Andy Serwin, Executive Committee Member at DLA Piper.

Looking Ahead

fpf meeting jun 2025.l2 2

FPF’s Senior Vice President for Policy, John Verdi, delivered closing remarks, thanking attendees for a full day of thoughtful and inspiring conversations. The forum successfully demonstrated that addressing digital governance challenges requires diverse perspectives, collaborative approaches, and ongoing dialogue between all stakeholders.

Thank you to those who participated in our Annual DC Privacy Forum: Governance for Digital Leadership and Innovation! This year’s DC Privacy Forum was made possible thanks to our sponsors RelyanceAI, ObservePoint, and Perkins Coie.

We hope to see you next year. For updates on FPF work, please visit FPF.org for all our reports, publications, and infographics, follow us on  LinkedIn, Instagram, Twitter/X, YouTube, and subscribe to our newsletter for the latest.

Written by Celeste Valentino, FPF Comms Intern

Future of Privacy Forum Announces Annual Privacy and AI Leadership Awards

New internship program established in honor of former FPF staff

Washington, D.C. – June 12, 2025  — The Future of Privacy Forum (FPF), a global non-profit focused on data protection, AI and emerging technologies, announced the recipients of the 2025 FPF Achievement Awards, honoring exceptional contributors to AI and privacy leadership in the public and private sectors.

FPF presented the Global Responsible AI Leadership Award to Brazil’s National Data Protection Authority (ANPD) in recognition of its comprehensive and forward-thinking approach to leadership in AI governance. 

Barbara Cosgrove, Vice President, Chief Privacy and Digital Trust Officer for Workday and a longtime privacy leader and mentor, was honored with the Career Achievement Award.

“It is a privilege to honor Barbara Cosgrove and the Brazilian National Data Protection Authority for their respective contributions to the fields of data protection and AI regulation,” said Jules Polonetsky, CEO of the Future of Privacy Forum. “This year’s awardees have all demonstrated the thoughtful leadership, bold vision, and creative thinking that is essential to advancing the responsible use of data for the benefit of society.”

2025 FPF Achievement Award Recipients include:

Brazil National Data Protection Authority, Global Responsible AI Leadership Award
Accepted by Miriam Wimmer 

Brazil’s National Data Protection Authority (ANPD) is this year’s recipient of the Global Responsible AI Leadership Award, which honors pioneers operating in the complex and rapidly evolving space where data protection and artificial intelligence intersect.

The Award recognizes ANPD’s comprehensive and forward-thinking approach to governing AI responsibly, most notably through initiatives like the Sandbox for AI and its influential work in developing thoughtful frameworks around generative AI. With a strong emphasis on public engagement, transparency, and international collaboration, ANPD is helping set a global benchmark for how innovation can advance while safeguarding privacy and individual rights. 

Barbara Cosgrove, Vice President, Chief Privacy and Digital Trust Officer, Workday, Career Achievement Award

Barbara Cosgrove serves as Vice President, Chief Privacy and Digital Trust Officer at Workday. During her tenure at Workday, Barbara has advocated for Workday globally on data protection matters, championed the company’s global data privacy strategy, implemented technology compliance standards, and developed privacy-by-design and machine learning ethics-by-design frameworks. Barbara has played a key role in establishing the company’s privacy fundamentals and fostering a culture of data protection, including serving as Workday’s chief security officer and leading the development of Workday’s initial AI governance program. Barbara is Vice-Chair of the International Association of Privacy Professionals (IAPP), and a member of FPF’s AI Leadership Council and Advisory Board. 

The awards were presented at a reception Wednesday evening following FPF’s Annual DC Privacy Forum, which brought together more than 250 government, civil society, academic, and corporate privacy leaders to for a series of discussions about AI policy, kids online safety, AI agents, and other topics top of mind to the administration and policymakers.

At the event, Melissa Maalouff, a shareholder with ZwillGen, also made a special announcement regarding a new internship that will be housed in FPF’s D.C. office. The Hannah Schaller Memorial Internship by ZwillGen honors the life and legacy of Hannah Schaller, a beloved friend, colleague, and talented privacy attorney who passed away earlier this year. 

Hannah started her career as a policy intern in FPF’s D.C. Office. She was a valuable contributor during her time at FPF and a rising star at ZwillGen, a boutique law firm specializing in technology and privacy law. Hannah remained closely connected to FPF following her internship, and was a valuable source of guidance and counsel to FPF members and staff. Hannah was also co-chair of the IAPP DC region KnowledgeNet Chapter.

The candidate selected for the Hannah Schaller Memorial Internship by ZwillGen will work in FPF’s D.C. office, directly with the organization’s policy staff, as Hannah did at the start of her career. Learn more about the internship and opportunities to support the program’s sustainability here. ZwillGenn firm has also created a post-graduate fellowship in Hannah’s honor.

“Hannah’s expertise and abilities as an attorney will leave a lasting impact on the privacy community, and she will be missed personally and for the professional and civic accomplishments that were in her future,” added Polonetsky. “This internship is a wonderful way to celebrate and honor her legacy by helping provide an on-ramp to students seeking a career in privacy.”

To learn more about the Future of Privacy Forum, visit fpf.org

##

About Future of Privacy Forum (FPF)

FPF is a global non-profit organization that brings together academics, civil society, government officials, and industry to evaluate the societal, policy, and legal implications of data use, identify the risks, and develop appropriate protections. FPF believes technology and data can benefit society and improve lives if the right laws, policies, and rules are in place. FPF has offices in Washington D.C., Brussels, Singapore, and Tel Aviv. Follow FPF on X and LinkedIn.

Brazil’s ANPD Preliminary Study on Generative AI highlights the dual nature of data protection law: balancing rights with technological innovation

Brazil’s Autoridade Nacional de Proteção de Dados (“ANPD”) Technology and Research Unit (“CGTP”) released the preliminary study Inteligência Artificial Generativa (“Preliminary Study on GenAI”, in Portuguese) as part of its Technological Radar series, on November 29, 2024.1 A short English version of the study was also released by the agency in December 2024. This analysis provides information for developers, processing agents, and data subjects on the potential benefits and challenges of generative AI in relation to the processing of personal information under existing data protection rules. 

Although this study does not offer formal legal guidance, it provides important insight into how the ANPD may approach future interpretation of the Lei Geral de Proteção de Dados (“LGPD”), Brazil’s national data protection law. As such, it aligns with a global trend of data protection regulators examining the impact of generative AI on privacy and data protection.2 The study sets up the framework for analyzing data protection legal requirements for Generative AI in the Brazilian context by acknowledging that balancing rights with technological innovation is a foundational principle of the LGPD. 

The analysis further takes into account that processing of personal data occurs during multiple stages in the life cycle of generative AI systems, from development to refinement of models. It addresses the legality of web scraping under the LGPD at the training stage, specifically considering that publicly available personal data falls under the scope of the law. The study proposes “thoughtful pre-processing practices”, such as anonymisation or collecting only necessary data for training. It then emphasizes “transparency” and “necessity” as two core principles of the LGPD that need enhanced attention and tailoring to the unique nature of Generative AI systems, before concluding that this technology should be developed from an “ethical, legal, and socio-technical” perspective if society is going to effectively harness its benefits.   

Balancing Rights with Technological Innovation: An LGPD Commitment 

The study acknowledges the relevance of balancing rights with technological innovation under the Brazilian framework. Article 1 of the LGPD identifies the objective of the law as ensuring the processing of personal data protects the fundamental rights of freedom, privacy, and the free development of personality.3 At the same time, Article 2 of the LGPD recognizes data protection is “grounded” on economic and technological development and innovation. 

The study recognizes that advances in machine learning enable generative AI systems beneficial to key fields, including healthcare, banking, and commerce and highlights three use cases likely to produce valuable benefits for Brazilian society. For instance, the Federal Court of Accounts is implementing “ChatTCU”, a generative model to assist the Court’s legal team in producing, translating, and examining legal texts more efficiently. Munai, a local health tech enterprise, is also developing a virtual assistant that will automate the evaluation, interpretation, and application of hospital protocols and support decision-making in the healthcare sector. Finally, Banco do Brasil is developing a Large Language Model (LLM) to assist employees in providing better customer service experiences. The study also highlights the increasing popularity of commercially available generative AI systems such as OpenAI’s ChatGPT and Google’s Gemini among Brazilian users. 

In this context, the study emphasizes that while generative AI systems can produce multiple benefits, it is necessary to assess their potential for creating new privacy risks and exacerbating existing ones. For the ANPD, “the generative approach is distinct from other artificial intelligence as it possesses the ability to generate content (data) […] which allows the system to learn how to make decisions according to the data uses.”4 In this context, the CGTP identifies three fundamental characteristics of generative AI systems that are relevant in the context of personal data processing:

  1. The need for large volumes of personal and non-personal data for system training purposes;
  2. The capability of inference that allows the generation of new data similar to the training data; and
  3. The adoption of a diverse set of computational techniques, such as the architecture of transformers for natural language processing systems.5 

For instance, the study mentions LLMs as examples of models trained on large volumes of data. LLMs capture semantic and syntactic relationships and are effective at understanding and generating text across different domains. However, they can also generate misleading answers and invent inaccurate “hallucinations.” Another example are foundational models, which are trained on diverse datasets and can perform tasks in multiple domains, often including some for which the model was not explicitly trained. 

The document underscores that the technical characteristics and possibilities of generative AI significantly impact the collection, storage, processing, sharing, and deletion of personal data. Therefore, the study holds, LGPD principles and obligations are relevant for data subjects and processing agents using generative AI systems.

Legality of web scraping, impacted by the fact the LGPD covers publicly accessible personal data 

The study notes that generative AI systems are typically trained with data collected through web scraping. Data scraped from publicly available sources may include identifiable information such as names, addresses, videos, opinions, user preferences, images, or other personal identifiers. Additionally, if there is an absence of thoughtful pre-processing practices in the collection phase (i.e. anonymizing or collecting only necessary data), it can increase the likelihood of including more personal data for training purposes, including sensitive and children’s data.

The document emphasizes that the LGPD covers publicly accessible personal data, and consequently, processors and AI developers must ensure compliance with personal data principles and obligations. Scraping operations that capture personal data must be based on one of the LGPD’s lawful bases for processing (Articles 7 and 11) and comply with data protection principles of good faith, purpose limitation, adequacy, and necessity (Article 7, par. 3). 

Moreover, the study warns that web scraping reduces data subjects’ control over their personal information. According to the CGTP, users generally remain unaware of web scraping involving their information and how developers may use their data to train generative AI systems. In some cases, scraping can result in a data subject’s loss of control over personal information after the user deletes or requests deletion of their data from a website, as prior scraping and data aggregation may have captured the data and made it available in open repositories. 

Allocation of responsibility depends on patterns of data sharing and hallucinations 

The ANPD also takes note of the processing of personal data during several stages in the life cycle of generative AI systems, from development to refinement of models. The study explains that generative AI’s ability to generate synthetic content extends beyond basic processing and encompasses continuous learning and modeling based on the ingested training data. Although the training data may be hidden through mathematical processes during training, the CTGP warns that vulnerabilities to the system, such as model inversion or membership inference attacks, could expose individuals included in training datasets. 

Furthermore, generative AI systems allow users to interact with models using natural language. Depending on the prompt, context, and information provided by the user, these interactions may generate outputs containing personal data about the user or other individuals. A notable challenge, according to the study, is to allocate responsibility in scenarios where i) personal data is generated and shared with third parties, even if a model was not specifically trained for that purpose; and ii) where a model creates a hallucination – false, harmful, or erroneous assumptions about a person’s life, dignity, or reputation, harming the subject’s right to free development of personality.

The study identifies three example scenarios in which personal data sharing can occur in the context of generative AI systems:

  1. Users sharing personal data through prompts 

This type of sharing occurs through the input of prompts by users, which can allow users to share information in diverse formats such as text, audio, and images, all of which may contain personal, confidential, and sensitive data. In some instances, users may not be aware of the risks involved in sharing personal information or, if aware, they might choose to “trust the system” to get the answers and assistance they need. In this scenario, the CGTP points out that safeguards should be developed to create privacy-friendly systems. One way to achieve this is to provide users with clear and easily accessible information about the use of prompts and the processing of personal data by generative AI tools. 

The study highlights that users sharing the personal data of other individuals through prompts may be considered processing agents under the LGPD and consequently be subject to its obligations and sanctioning regime. Nonetheless, the CGTP cautions that transferring responsibility exclusively to users is not enough to safeguard personal data protection or privacy in the context of generative AI.

  1. Sharing AI-generated outputs containing personal data with third parties 

Under this scenario, output or AI-generated content can contain personal data, which could be shared with third parties. The CGTP notes this presents the risk of the personal data being used for secondary purposes unknown to the initial user that the AI developer is unlikely to control. Similar to the previous scenario and data processing activities in general, the study notes the relevance of establishing a “chain of responsibility” among the different agents involved to ensure compliance with the LGPD. 

  1. Sharing pre-trained models containing personal data 

A third scenario is sharing a pre-trained model itself, and consequently, any personal data present in the model. According to the CGTP, “since pre-trained models can be considered a reflection of the database used for training, the popularization of the creation of APIs (Application Programming Interfaces) that adopt foundational models such as pre-trained LLMs, brings a new challenge. Sharing models tends to involve the data that is mathematically present in them”6 (translated from the Portuguese study). Pre-trained models, which contain a reflection of the training data, make it possible to adjust the foundational model for a specific use or domain. 

The CGTP cautions that the possibility of refining a model via the results obtained through prompt interaction may allow for a “continuous cycle of processing” of personal data.7 According to the technical Unit, “the sharing of foundational models that have been trained with personal data, as well as the use of this data for refinement, may involve risks related to data protection depending on the purpose8.”

Relatedly, the document highlights the relevance of the right to delete personal data in the context of generative AI systems. The study emphasizes that the processing of personal data can be present through diverse stages of the AI’s lifecycle, including the generation of synthetic content, through prompt interaction – which allows new data to be shared – and the continuous refinement of the model. In this context, the study points out that this continuous processing of personal data presents significant challenges in (i) delimiting the end of the processing period; (ii) determining whether the purpose of the intended processing was achieved, and (iii) the implications of revoking consent, if the processing relied on this basis. 

Transparency and Necessity Principles: Essential for Responsible Gen-AI under the LGPD

Some LGPD principles have special relevance for the development and use of generative AI systems. The report takes the view that these systems typically lack detailed technical and non-technical information about the processing of personal data. The CGTP warns that this absence of transparency begins in the pre-training phase and extends to the training and refinement of models. The study suggests developers may fail to inform users about how their personal information could be shared under the three scenarios identified above (prompt use, outputs, or foundational models). As a result, individuals are usually unaware their information is used for generative AI training purposes and are not provided with adequate, clear, and accessible information about other processing operations such as sharing their personal information with third parties. 

In this context, the ANPD emphasizes that the transparency principle is especially relevant in the context of the responsible use and development of AI systems. Under the LGPD, this principle requires clear, precise, and easily accessible information about the data processing. The CGTP proposes that the existence and availability of detailed documentation can be a starting point for compliance and can help monitor the development and improvement of generative AI systems. 

Similarly, the necessity principle limits data processing to what is strictly required for developing generative AI systems. Under the LGPD, this principle requires the processing to be the minimum required for the accomplishment of its purposes, encompassing relevant, proportional, and non-excessive data. According to the ANPD, AI developers should be thoughtful about the data to be included in their training datasets and make reasonable efforts to limit the amount and type of information necessary for the purposes to be achieved by the system. Determining how to apply this principle to the creation of multipurpose or general-purpose “foundation models” is an ongoing challenge in the broader data protection space.

Looking Into the Future 

The study concludes that generative AI must be developed from an “ethical, legal, and socio-technical” perspective if society is going to effectively harness its benefits while limiting the risks it poses. The CGTP acknowledges that generative AI may offer solutions in multiple fields and applications, however, society and regulators must be aware that generative AI may also entail new risks or exacerbate existing ones concerning privacy, data protection, and other freedoms. The CGTP highlights that this first report includes preliminary analysis and that further studies in the field are necessary to guarantee adequate protection of personal data, as well as the trustworthiness of the outputs generated by this technology. 


  1. The ANPD’s “Technological Radar” series address “emerging technologies that will impact or are already impacting the national and international scenario of personal data protection” with an emphasis on the Brazilian context. “The purpose of the series is to aggregate relevant information to the debate on data protection in the country, with educational texts accessible to the general public”.  ↩︎
  2. See, for example, Infocomm Media Development Authority, “Model AI Governance Framework for Generative AI” (May 2024); European Data Protection Supervisor, “First EDPS Orientations for ensuring data protection compliance when using Generative AI systems” (June 2024); Commission nationale de l’informatique et des libertés (CNIL), “AI how-to sheets” (June 2024) ; UK’s Information Commissioner’s Office, “Information Commissioner’s Office response to the consultation series on generative AI” (December 2024); European Data Protection Board, “Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models” (December 2024). ↩︎
  3. LGPD Article 1, available at http://www.planalto.gov.br/ccivil_03/_ato2015-2018/2018/lei/L13709compilado.htm. ↩︎
  4. ANPD, Technology Radar, “Generative Artificial Intelligence”, 2024, p. 7. ↩︎
  5. ANPD, Radar Tecnologico, “Inteligência Artificial Generativa”, 2024, pp. 16-17. ↩︎
  6. ANPD, Radar Tecnologico, “Inteligência Artificial Generativa”, 2024, pp. 24-25. ↩︎
  7. Id. ↩︎
  8. Id. ↩︎