FPF at PDP Week 2025: Generative AI, Digital Trust, and the Future of Cross-Border Data Transfers in APAC

Authors: Darren Ang Wei Cheng and James Jerin Akash (FPF APAC Interns)

From July 7 to 10, 2025, the Future of Privacy Forum (FPF)’s Asia-Pacific (APAC) office was actively engaged in Singapore’s Personal Data Protection Week 2025 (PDP Week) – a week of events hosted by the Personal Data Protection Commission of Singapore (PDPC) at the Marina Bay Sands Expo and Convention Centre in Singapore. 

Alongside the PDPC’s events, PDP Week also included a two-day industry conference organized by the International Association of Privacy Professionals (IAPP) – the IAPP Asia Privacy Forum and AI Governance Global.

This blog post presents key takeaways from the wide range of events and engagements that FPF APAC led and participated in throughout the week. Key themes that emerged from the week’s discussions included:

In the paragraphs below, we elaborate on some of these themes, as well as other interesting observations that came up over the course of FPF’s involvement in PDP Week.

1. FPF’s and IMDA’s co-hosted workshop shared practical perspectives for companies navigating the waters of generative AI governance. 

On Monday, July 7, 2025, FPF joined the Infocomm Media Development Authority of Singapore (IMDA) in hosting a workshop for Singapore’s data protection community, titled “AI, AI, Captain!: Steering your organisation in the waters of Gen AI by IMDA and FPF.” The highly-anticipated event provided participants with practical knowledge about AI governance at the organizational level.  

The event was hosted by Josh Lee Kok Thong, Managing Director of FPF APAC, and was attended by around 200 representatives from industry, including data protection officers (DPOs) and chief technology officers (CTOs). FPF’s segment of the workshop had two parts: an  informational segment featuring presentations from FPF and IMDA, followed by a multi-stakeholder, practice-focused panel discussion.

photo1

FPF at “AI, AI, Captain! – Steering your organisation in the waters of Gen AI by IMDA and FPF”, July 8, 2025.

1.1 AI governance in APAC is neither unguided nor ungoverned, as policymakers are actively working to develop both soft and hard regulations for AI and to clarify how existing data protection laws apply to its use.

Josh presented on global AI governance, highlighting the rapid legislative changes in the APAC region over the past six months, and comparing developments in South Korea, Japan, and Vietnam with those in the EU, US, and Latin America. He then discussed how data protection laws – especially provisions on consent, data subject rights, and breach management – impact AI governance and how data protection regulators in Japan, South Korea, and Hong Kong (among others) have provided guidance on this.
Josh’s presentation was followed by one from Darshini Ramiah, Senior Manager of AI Governance and Safety at IMDA. Darshini provided an overview of Singapore’s approach to AI governance, which is built on three key pillars: 

  1. Creating practical tools, such as the AI Verify toolkit and Project Moonshot, which enable benchmarking and red teaming of both traditional AI systems and large language models (LLMs) respectively;
  2. Engaging closely with international partners, such as through the ASEAN Working Group on AI Governance and the publication of the AI Playbook for Small States under the Digital Forum of Small States; and
  3. Collaborating with industry in the development of principles and tools around AI governance.

photo 2

FPF presenting at “AI AI Captain – Steering your organisation in the waters of Gen AI by IMDA and FPF”, July 8, 2025.

1.2 FPF moderated a panel session that focused on key aspects of AI governance and featured industry experts and regulators.

The panel session of the workshop, moderated by Josh, included the following experts:

  1. Darshini Ramiah, Senior Manager, AI Governance and Safety at IMDA;
  2. Derek Ho, Deputy Chief Privacy, AI and Data Responsibility Officer at Mastercard; and
  3. Patrick Chua, Senior Principal Digital Strategist at Singapore Airlines (SIA).

The experts discussed AI governance from both an industry and regulatory perspective.

photo 3

FPF moderating the panel session at “AI AI Captain – Steering your organisation in the waters of Gen AI by IMDA and FPF”, July 8, 2025.

2. FPF facilitated deep conversations at PDPC’s PETs Summit, including on the use of PETs in cross-border data transfers and within SMEs.

2.1 FPF moderated a fireside chat on PETs use cases during the opening Plenary Session. 

On Tuesday, July 8, 2025 FPF APAC participated in a day-long PETs Summit, organized by the PDPC and IMDA. During the opening plenary session, Josh moderated a fireside chat with Fabio Bruno, Assistant Director of Applied Innovation at INTERPOL, titled Solving Big Problems with PETs.” Following panels that covered use cases for PETs and policies that could increase their adoption, this fireside chat looked at how PETs could present fresh solutions to long-standing data protection issues (such as cross-border data transfers). 

In this regard, Fabio shared how law enforcement bodies around the world have been exploring PETs to streamline investigations. He highlighted ongoing exploration of certain PETs, such as zero-knowledge proofs (a cryptographic method that allows one party to prove to another party that a particular piece of information is true without revealing any additional information beyond the validity of the claim) and homomorphic encryption (a family of encryption schemes allowing for computations to be performed directly on encrypted data without having to first decrypt it). In a law enforcement context, these PETs enable preliminary validation that can help to reduce delays and lower the cost of investigations, while also helping to protect individuals’ privacy. 

Notwithstanding the potential of PETs for cross-border data transfers (even for commercial, non-law enforcement contexts), challenges exist. These include: (1) enhancing and harmonizing the understanding and acceptability of PETs among data protection regulators globally; and (2) obtaining higher management support to invest in PETs. Nevertheless, the fireside chat concluded with optimism about the prospect of the greater use of PETs for data transfers, and left the audience with plenty of food for thought. 

photo 4

FPF moderating the fireside chat at PETs Summit Plenary Session, July 8, 2025

2.2 FPF Members facilitated an engaging PETs Deep Dive Session that explored business use cases for PETs.

After the plenary session, FPF APAC teammates Dominic Paulger, Sakshi Shivhare, and Bilal Mohamed facilitated a practical workshop, titled the “PETs Deep Dive Session” that was organized by the IMDA. Drawing on the IMDA’s draft PETs Adoption Guide, the workshop aimed to help Chief Data Officers, DPOs, and AI and data product teams understand which PETs best fit their business use cases. 

photo 5

FPF APAC Team at PETs Summit, July 8, 2025

3. On Wednesday, FPF joined a discussion at IAPP Asia Privacy Forum on how regulators and major tech companies in the APAC region are fostering “digital trust” in AI by aligning technology with societal expectations.

On Wednesday, July 9, 2025, FPF APAC participated in an IAPP Asia Privacy Forum panel titled “Building Digital Trust in AI: Perspectives from APAC.” Josh joined Lanah Kammourieh Donnelly, Global Head of Privacy Policy, at Google, and Lee Wan Sie, Cluster Director for AI Governance and Safety at the IMDA for a panel moderated by Justin B. Weiss, Senior Director at Crowell Global Advisors. 

A key theme from the panel was that, given the opacity of many digital technologies, the concept of digital trust is essential to ensure that these technologies work in ways that protect important societal interests. Accordingly, the panel discussed strategies that could foster digital trust.

Wan Sie provided the regulator’s perspective and acknowledged that given the rapid pace of AI development, regulation would always be “playing catch-up.” Thus, instead of implementing a horizontal AI law, she shared how Singapore is focusing on making the industry more capable of using AI responsibly. Wan Sie pointed to AI Verify, Singapore’s AI governance testing framework and toolkit, and the IMDA’s new Global AI Assurance Sandbox, as mechanisms that help organizations ensure their AI systems could demonstrate greater trustworthiness to users.

Josh focused on trends from across the APAC region, sharing how regulators in Japan and South Korea have been actively considering amendments to their data protection laws to expand the legal bases for processing personal data, in order to facilitate greater availability of data for training high-quality AI systems. 

Lanah highlighted Google’s approach of developing AI responsibly in accordance with certain core privacy values, such as those in the Fair Information Practice Principles (FIPPs). For example, she shared how Google is actively researching technological solutions like training its models on synthetic data instead of using publicly-available datasets from the Internet which may contain large amounts of personal data. 

Overall, the panel noted that APAC is taking its own distinct approach to AI governance – one in which industry and regulators collaborate actively to ensure principled development of technology. 

photo 6

FPF and the “Building Digital Trust in AI: Perspectives from APAC” panel at IAPP, 9 July 2025.

4. On Thursday, FPF staff moderated two panels at IAPP AI Governance Global on cross-border data transfers and regulatory developments in Australia

4.1 While cross-border data transfers are fragmented and restrictive, there is cautious optimism that APAC will pursue interoperability. 

On Thursday, July 10, 2025, FPF organised a panel titled “Shifting Sands: The Outlook for Cross Border Data Transfers in APAC” which featured Emily Hancock, Vice President and Chief Privacy Officer at Cloudflare, Arianne Jimenez, Head of Privacy and Data Policy and Engagement for APAC at Meta and Zee Kin Yeong, Chief Executive of the Singapore Academy of Law and FPF Senior Fellow. Moderated by Josh, the panel discussed evolving regulatory frameworks for cross-border data transfers in APAC. 

The panel first observed that the landscape for cross-border data transfers across APAC remains fragmented. Emily elaborated that restrictions on data transfer were a global phenomenon and attributable to how data is increasingly viewed as a national security matter, making governments less willing to lower restrictions and pursue interoperability.

Despite this challenging landscape, the panel members were cautiously optimistic that transfer restrictions could be managed effectively. Zee Kin highlighted how the increasing integration of economies through supranational organizations like ASEAN is driving a push in APAC towards recognizing more business-friendly data transfer mechanisms, such as the ASEAN MCCs. He also noted that regulators often relax restrictions once local businesses start to expand operations overseas and need to transfer data across borders.

Arianne suggested that businesses communicate to regulators the challenges they face with restrictive data transfer frameworks. She acknowledged that SMEs are often not as well-resourced as multi-national corporations (MNCs), and thus faced difficulties in navigating the complex patchwork of regulations across the region. She explained that since regulators in APAC are generally open to consultation, businesses should take the opportunity to advocate for more interoperability. 

The panel concluded by highlighting the importance of data transfers to AI development. Cross-border data transfers are crucial to fostering diverse datasets, accessing advanced computing infrastructure, combating global cyber-threats by enabling worldwide threat sharing, and reducing the environmental impact by limiting the need for additional data centers. Overall, the panel expressed hope that despite the legal fragmentation and complicated state of play, the clear benefits of cross-border data transfers would encourage jurisdictions to pursue greater interoperability. 

photo 7

FPF and the “Shifting Sands: The Outlook for Cross Border Data Transfers in APAC” panel at IAPP July 10, 2025.

4.2 With updates to Australia’s Privacy Act, privacy is non-negotiable, and businesses can benefit from improving their privacy compliance processes and systems ahead of increased enforcement. 

FPF’s APAC Deputy Director Dominic Paulger moderated a panel titled Navigating the Impact of Australia’s Privacy Act Amendments in the Asia-Pacific.” The panelists included Dora Amoah, Global Privacy Office Lead at the Boeing Company, Rachel Baker, Senior Corporate Counsel for Privacy, JAPAC, at Salesforce, and Annelies Moens, the former Managing Director of Privcore. The panel discussed the enactment of the Privacy and Other Legislation Amendment Bill 2024 following a multiyear review of Australia’s Privacy Act, and the potential impact of these reforms on businesses.

Annelies shared an overview of the reforms, including: 

She mentioned that more changes could be coming, but some proposals – such as removing the small business exception – were facing resistance in Australia. However, irrespective of how the law develops, businesses can expect enforcement to increase.

The industry panelists shared their insights and experiences complying with the new amendments. Dora explained that despite the increased litigation risk from the new statutory tort for serious invasions of privacy, the threshold for liability was rather high as the tort required intent. She also noted that companies could avoid liability through implementing proper processes that prevent intentional or reckless misconduct. 

Rachel noted that the Privacy Act’s new ADM provisions would improve consumer rights in Australia. She observed how Australians have been facing serious privacy intrusions that have drawn the OAIC’s attention, such as the Cambridge Analytica scandal, and the mis-use of facial recognition technology. She considered that since data subjects in Australia are increasingly expecting more rights, such as the right to deletion, businesses should go beyond compliance and actively adopt best practices. 

Overall, the panel expressed the view that with this new reality, the role of the privacy professional in Australia, much like the rest of the world, is evolving to not just interpret and comply with the law but also to build robust systems through privacy by design.   

photo 8

FPF and the panelists of “Navigating the Impact of Australia’s Privacy Act Amendments in the Asia-Pacific” at IAPP July 10, 2025.

5. FPF organized exclusive side events to foster deeper engagements with key stakeholders. 

A key theme of FPF’s annual PDP Week experience has always been about bringing our global FPF community – members, fellows, and friends – together for deep and meaningful conversations about the latest developments. This year, FPF APAC organized two events for its members: a Privacy Leaders’ Luncheon (an annual staple), and for the first time, an India Luncheon co-organized alongside Khaitan & Co.  

5.1 On July 8, 2025, FPF hosted an invite-only Privacy Leaders’ Luncheon

This closed-door event provided a platform for senior stakeholders of FPF APAC to discuss pressing challenges at the intersection of AI and privacy, with a particular focus on the APAC region. During the session, the attendees discussed key topics such as the emerging developments in data protection laws, AI governance, and children’s privacy. 

photo 9

FPF’s Privacy Leaders Luncheon, July 8, 2025.

5.2 On July 10, FPF co-hosted an India Roundtable Luncheon with Khaitan & Co.

FPF APAC also collaborated with an Indian law firm, Khaitan & Co, to co-host a lunch roundtable focusing on pressing challenges in India, such as the development of implementing rules for the Digital Personal Data Protection Act, 2023 (DPDPA). The event brought together experts from both India and Singapore for fruitful discussions around the DPDPA and the draft Digital Personal Data Protection Rules. FPF APAC is grateful to have partnered with Khaitan & Co for the Luncheon, which saw active discussion amongst attendees on key issues in India’s emerging data protection regime. 

photo 10

FPF’s India Luncheon co-hosted with Khaitan & Co, July 10, 2025.

6. Conclusion

In all, it has been another deeply fruitful and meaningful year for FPF at Singapore’s PDP Week 2025. Through our panels, engagements, and curated roundtable sessions, FPF is proud to have been able to continue to drive thoughtful and earnest dialogue on data protection, AI, and responsible innovation across the APAC region. These engagements reflect our ongoing commitment to fostering greater collaboration and understanding among regulators, industry, academia, and civil society.

Looking ahead, FPF remains focused on shaping thoughtful approaches to privacy and emerging technologies. We are grateful for the continued support of the IMDA, IAPP, as well as our members, partners, and participants, who helped make these events a memorable success.

Nature of Data in Pre-Trained Large Language Models

The following is a guest post to the FPF blog by Yeong Zee Kin, the Chief Executive of the Singapore Academy of Law and FPF Senior Fellow. The guest blog reflects the opinion of the author only. Guest blog posts do not necessarily reflect the views of FPF.

The phenomenon of memorisation has fomented significant debate over whether Large Language Models (LLM) store copies of the data that they are trained on.1 In copyright circles, this has led to lawsuits such as the one by the New York Times against OpenAI that alleges that ChatGPT will reproduce NYT articles nearly verbatim.2 While in the privacy space, much ink has also been spilt over the question whether LLMs store personal data. 

This blog post commences with an overview of what happens to data that is processed during LLM training3: first, how data is tokenised, and second, how the model learns and embeds contextual information within the neural network. Next, it discusses how LLMs store data and contextual information differently from classical information storage and retrieval systems, and examines the legal implications that arise from this. Thereafter, it attempts to demystify the phenomenon of memorisation, to gain a better understanding of why partial regurgitation occurs. This blog post concludes with some suggestions on how LLMs can be used in AI systems for fluency, while highlighting the importance of providing grounding and the safeguards that can be considered when personal data is processed.

While this is not a technical paper, it aims to be sufficiently technical so as to provide an accurate description of the relevant internal components of LLMs and an explanation of how model training changes them. By demystifying how data is stored and processed by LLMs, this blog post aims to provide guidance on where technical measures can be most effectively applied in order to address personal data protection risks. 

  1. What are the components of a Large Language Model?

LLMs are causal language models that are optimised for predicting the next word based on previous words.4 An LLM comprises a parameter file, a runtime script and configuration files.5 The LLM’s algorithm resides in the script, which is a relatively small component of the LLM.6 Configuration and parameter files are essentially text files (i.e. data).7 Parameters are the learned weights and biases,8 expressed as numerical values, that are crucial for the model’s prediction: they represent the LLM’s pre-trained state.9 In combination, the parameter file, runtime script and configuration files form a neural network. 

There are two essential stages to model training. The first stage is tokenisation. This is when training data is broken down into smaller units (i.e. segmented) and converted into tokens. For now, think of each token as representing a word (we will discuss subword tokenisation later). Each token is assigned a unique ID. The mapping of each token to its unique ID is stored in a lookup table, which is referred to as the LLM’s vocabulary. The vocabulary is one of the LLM’s configuration files. The vocabulary plays an important role during inference: it is used to encode input text for processing and decode output sequences back into human-readable text (i.e. the generated response).

fig 1

Figure 1. Sample vocabulary list from GPT-Legal; each token is associated with an ID (the vocabulary size of GPT-Legal is 128,256 tokens).

The next stage is embedding. This is a mathematical process that distills contextual information about each token (i.e. word) from the training data and encodes it into a numerical representation known as a vector. A vector is created for each token: this is known as the token vector. During LLM training, the mathematical representations of tokens (their vectors) are refined as the LLM learns from the training data. When LLM training is completed, token vectors are stored in the trained model. The mapping of the unique ID and token vector is stored in the parameter file as an embedding matrix. Token vectors are used by LLMs during inference to create the initial input vector that is fed through the neural network.

fig 2

Figure 2. Sample embedding matrix from GPT-Legal: each row is one token vector, each value is one dimension (GPT-Legal has 128,256 token vectors, each with 4,096 dimensions)

LLMs are neural networks that may be visualised as layers of nodes with connections between them.10 Adjustments to embeddings also take place in the neural network during LLM training. Model training adjusts the weights and biases of the connections between these nodes. This changes how input vectors are transformed as they pass through the layers of the neural network during inference. This produces an output vector that the LLM uses to compute a probability score for each potential token that may follow, which increases or decreases the probability that one token will follow another. The LLM uses these probability scores to select the next token through various sampling methods.11 This is how LLMs predict the next token when generating responses.

In the following sections, we dive deeper into each of these stages to better understand how data is processed and stored in the LLM.

Stage 1: Tokenisation of training data 

During the tokenisation stage, text is converted into tokens. This is done algorithmically by applying the chosen tokenisation technique. There are different methods of tokenisation, each with its benefits and limitations. Depending on the tokenisation method used, each token may represent a word or a subword (i.e. segments of the word). 

The method that is commonly used in LLMs is subword tokenisation.12 It provides benefits over word-level tokenisation, such as a smaller vocabulary, which can lead to more efficient training.13 Subword tokenisation analyses the training corpus to identify subword units based on the frequency with which a set of characters occurs. For example, “pseudonymisation” may be broken up into “pseudonym” and “isation”; while, “reacting” may be broken up into “re”, “act” and “ing”. Each subword forms its own token.

Taking this approach results in a smaller vocabulary since common prefixes (e.g. “re”) and suffixes (e.g. “isation” and “ing”) have their own tokens that can be re-used in combination with other stem words (e.g. combining with “mind” to form “remind” and “minding”). This improves efficiency during model training and inference. Subword tokens may also contain white space or punctuation marks. This enables the LLM to learn patterns, such as which subwords are usually prefixes, which are usually suffixes, and how frequently certain words are used at the start or end of a sentence. 

Subword tokenisation also enables the LLM to handle out-of-vocabulary (OOV) words. This happens when the LLM is provided with a word during inference that it did not encounter during training. By segmenting the new word into subwords, there is a higher chance that the subwords of the OOV word are found in its vocabulary. Each subword token is assigned a unique ID. The mapping of a token with its unique ID is stored in a lookup table in a configuration file, known as the vocabulary, which is a crucial component of the LLM. It should be noted that this is the only place within the LLM where human-readable text appears. The LLM uses the unique ID of the token in all its processing.

The training data is encoded by replacing subwords with their unique ID before processing.14 This process of converting the original text into a sequence of IDs corresponding to tokens is referred to as tokenisation. During inference, input text is also tokenised for processing. It is only at the decoding stage that human-readable words are formed when the output sequence is decoded by replacing token IDs with the matching subwords in order to generate a human-readable response.

Stage 2: Embedding contextual information

Complex contextual information can be reflected as patterns in high-dimensional vectors. The greater the complexity, the higher the number of features that are needed. These are reflected as parameters of the high dimension vectors. Contrariwise, low dimension vectors contain fewer features and have lower representational capacity. 

The embedding stage of LLM training captures the complexities of semantics and syntax as high dimension vectors. The semantic meaning of words, phrases and sentences and the syntactic rules of grammar and sentence structure are converted into numbers. These are reflected as values in a string of parameters that form part of the vector. In this way, the semantic meaning of words and relevant syntactic rules are embedded in the vector: i.e. embeddings. 

During LLM training, a token vector is created for each token. The token vector is adjusted to reflect the contextual information about the token as the LLM learns from the training corpus. With each iteration of LLM training, the LLM learns about the relationships of the token, e.g. where it appears and how it relates to the tokens before and after. In order to embed all this contextual information, the token vector has a large number of parameters, i.e. it is a high dimension vector. At the end of LLM training, the token vector is fixed and stored in the pre-trained model. Specifically, the mapping of unique ID and token vector is stored as an embedding matrix in the parameter file. 

Model training also embeds contextual information in the layers of the neural network by adjusting the connections between nodes. As the LLM learns from the training corpus during model training, the weights of connections between nodes are modified. These adjustments encode patterns from the training corpus that reflect the semantic meaning of words and the syntactic rules governing their usage.15 Training may also increase or decrease the biases of nodes. Adjustments to model weights and bias affect how input vectors are transformed as they pass through the layers of the neural network. These are reflected in the model’s parameters. Thus, contextual information is also embedded in the layers of the neural network during LLM training. Contextual embeddings form the deeper layers of the neural network.

Contextual embeddings increase or decrease the likelihood that one token will follow another when the LLM is generating a response. During inference, the LLM converts the input text into tokens and looks up the corresponding token vector from its embedding matrix. The model also generates contextual representations that capture how the token relates to other tokens in the sequence. Next, the LLM creates an input vector by combining the static token vector and the contextual vector. As input vectors pass through the neural network, they are transformed by the contextual embeddings in its deeper layers. Output vectors are used by the LLM to compute probability scores for the tokens, which reflect the likelihood that one subword (i.e. token) will follow another. LLMs generate responses using the computed probability scores. For instance, based on these probabilities, it is more likely that the subword that follows “re” is going to be “mind” or “turn” (since “remind” and “return” are common words), less likely to be “purpose” (unless the training dataset contains significant technical documents where “repurpose” is used); and extremely unlikely to be “step” (since “restep” is not a recognised word).

Thus, LLMs capture the probabilistic relationships between tokens based on patterns in the training data and as influenced by training hyperparameters. LLMs do not store the entire phrase or textual string that was processed during the training phase in the same way that this would be stored in a spreadsheet, database or document repository. While LLMs do not store specific phrases or strings, they are able to generalise and create new combinations based on the patterns they have learnt from the training corpus.

2. Do LLMs store personal data?

Personal data is information about an individual who can be identified or is identifiable from the information on its own (i.e. direct) or in combination with other accessible information (i.e. indirect).16 From this definition, several pertinent characteristics of personal data may be identified. First, personal data is information in the sense that it is a collection of several datapoints. Second, that collection must be associated with an individual. Third, that individual must be identifiable from the collection of datapoints alone or in combination with other accessible information. This section examines whether data that is stored in LLMs retain these qualities.

An LLM does not store personal data in the way that a spreadsheet, database or document repository stores personal data. Billing and shipping information about a customer may be stored as a row in a spreadsheet; the employment details, leave records, and performance records of an employee may be stored as records in the tables of a relational database; and the detailed curriculum vitae of prospective, current and past employees may be contained in separate documents stored in a document repository. In these information storage and retrieval systems, personal data is stored intact and its association with the individual is preserved: the record may also be retrieved in its entirety or partially. In other words, each collection of datapoints about an individual is stored as a separate record; and if the same datapoint is common to multiple records, it appears in each of those records.17

Additionally, information storage and retrieval systems are designed to allow structured queries to select and retrieve specific records, either partially or in its entirety. The integrity of storage and retrieval underpins data protection obligations such as accuracy and data security (to prevent unauthorised alteration or deletion), and data subject rights such as correction and erasure.

For the purpose of this discussion, imagine that the training dataset comprises billing and shipping records that contain names, addresses and contact information such as email addresses and telephone numbers. During training, subword tokens are created from names in the training corpus. These may be used in combination to form names and may also be used to form email addresses (since many people use a variation of their names for their email address) and possibly even street names (since names are often named after famous individuals). The LLM is able to generate billing and shipping information that conform to the expected patterns, but the information will likely be incorrect or fictitious. This explains the phenomenon of hallucinations.

During LLM training, personal data is segmented into subwords during tokenisation. This adaptation or alteration of personal data amounts to processing, which is why a legal basis must be identified for model training. The focus of this discussion is the nature of the tokens and embeddings that are stored within the LLM after model training: are they still in the nature of personal data? The first observation that may be made is that many words that make up names (or other personal information) may be segmented into subwords. For example, “Edward” may not be stored in the vocabulary as is but segmented into the subwords “ed” and “ward”. Both these subwords can be used during decoding to form other words, such as “edit” and “forward”. This example shows how a word that started as part of a name (i.e. personal data), after segmentation, produces subwords that can be reused to form other types of words (some of which may be personal data, some of which may not be personal data). 

Next, while the vocabulary may contain words that correspond to names or other types of identifiers, the way they are stored in the lookup table as discrete tokens removes the quality of identification from the word. A lookup table is essentially that: a table. It may be sorted by alphanumeric or chronological order (e.g. recent entries are appended to the end of the table). The vocabulary stores datapoints but not the association between datapoints that enables them to form a collection which can relate to an identifiable individual. By way of illustration, having the word “Coleman” in the vocabulary as a token is neither here nor there, since it could equally be the name of Hong Kong’s highest-ranked male tennis player (Coleman Wong) or the street that the Singapore Academy of Law is located (Coleman Street). The vocabulary does not store any association of this word to either Coleman Wong (as part of his name) or to the Chief Executive of the Singapore Academy of Law (as part of his office address).

Furthermore, subword tokenisation enables a token to be used in multiple combinations during inference. Keeping with this illustration, the token “Coleman” may be used in combination with either “Wong” or “Street” when the LLM is generating a response. The LLM does not store “Coleman Wong” as a name or “Coleman Street” as a street name. The association of datapoints to form a collection is not stored. What the LLM stores are learned patterns about how words and phrases typically appear together, based on what it observed in the training data. Hence, if there are many persons named “Coleman” in the training dataset but with different surnames, and no one else whose address is “Coleman Street”, then the LLM is likely to predict a different word after “Coleman” during inference. 

Thus, LLMs do not store personal data in the same manner as traditional information storage and retrieval systems; more importantly, they are not designed to enable query and retrieval of personal data. To be clear, personal data in the training corpus is processed during tokenisation. Hence, a legal basis must be identified for model training. However, model training does not learn the associations of datapoints inter se nor the collection of datapoints with an identifiable individual, such that the data that is ultimately stored in the LLM loses the quality of personal data.18 

3. What about memorisation?

A discussion of how LLMs store and reproduce data is incomplete without a discussion of the phenomenon of memorisation. This is a characteristic of LLMs that reflects the patterns of words that are found in sufficiently large quantities in the training corpus. When certain combination of words or phrases appear consistently and frequently in the training corpus, the probability of predicting that combination of words or phrases increases. 

Memorisation in LLMs is closely related to two key machine learning concepts: bias and overfitting. Bias occurs when training data overrepresents certain patterns, causing models to develop a tendency toward reproducing those specific sequences. Overfitting occurs when a model learns training examples too precisely, including noise and specific details, rather than learning generalisable patterns. Both phenomena exacerbate memorisation of training data, particularly personal information that appears frequently in the dataset. For example, Lee Kuan Yew is Singapore’s first prime minister post-Independence with significant global influence; he lived at 38 Oxley Road. LLMs trained on a corpus of data from the Internet would have learnt this. Hence, ChatGPT is able to produce a response (without searching the Web) about who he is and where he lived. It is able to reproduce (as opposed to retrieve) personal data about him because they appeared in the training corpus in a significant volume. Because this sequence of words appeared frequently – and often – in the training corpus, when the LLM is given the sequence of words “Lee Kuan”, the probability of predicting “Yew” is significantly higher than any other word; and in the context of name and address of Singapore’s first prime minister, the probability of predicting Lee Kuan Yew and 38 Oxley Road is significantly higher than others. 

This explains the phenomenon of memorisation. Memorisation occurs when the LLM learns frequent patterns and reproduces closely related datapoints. It should be highlighted that this reproduction is probabilistic. This is not the same as query and retrieval of data stored as records in deterministic information systems.

The first observation to be made is that whilst this is acceptable for famous figures, the same cannot be said for private individuals. Knowing that this phenomenon reflects the training corpus, the obvious thing to avoid is the use of personal data for training of LLMs. This exhortation applies equally to developers of pre-trained LLMs and deployers who may fine-tune LLMs or engage in other forms of post-training, such as reinforcement learning. There are ample good practices for this. Techniques may be applied on the training corpus before model training to remove, reduce or hide personal data: e.g. pseudonymisation (to de-identify individuals in the training corpus), data minimisation (to exclude unnecessary personal data) and differential privacy (adding random noise to obfuscate personal data). When inclusion of personal data in the training corpus is unavoidable, there are mitigatory techniques that can be applied to the trained model.

One such example is machine unlearning, a technique currently under active research and development, that has the potential of removing the influence of specific data points from the trained model. This technique may be applied to reduce the risk of reproducing personal data.

Another observation that may be made is that the reproduction of personal data is not verbatim but paraphrased, hence it is also referred to as partial regurgitation. This underscores the fact that the LLM does not store the associations between datapoints necessary to make them a collection of information about an individual. Even if personal data is reproduced, it is because of the high probability scores for that combination of words, and not the output of a query and retrieval function. Paraphrasing may introduce distortions or inaccuracies when reproducing personal data, such as variations in job titles or appointments. Reproduction is also inconsistent and oftentimes incomplete.

Unsurprising, since the predictions are probabilistic after all. 

Finally, it bears reiterating that personal data is not stored as is but segmented into subwords, and reproduction of personal data is probabilistic, with no absolute guarantee that a collection of datapoints about an individual will always be reproduced completely or accurately. Thus, reproduction is not the same as retrieval. Parenthetically, it may also be reasoned that if the tokens and embeddings do not possess the quality of personal data, their combination during inference is processing of data, but just not the processing of personal data. Be that as it may, the risk of reproducing personal data – however, incomplete and inaccurate – can and must still be addressed. Technical measures such as output filters can be implemented as part of the AI system. These are directed at the responses generated by the model and not the model itself.

4. How should we use LLMs to process personal data?

LLMs are not designed or intended to store and retrieve personal data in the same way that traditional information storage and retrieval systems are; but they can be used to process personal data. In AI systems, LLMs provide fluency during the generation of responses. LLMs can incorporate personal data in their responses when personal data is provided, e.g., personal data provided as part of user prompts, or when user prompts cause the LLM to reproduce personal data as part of the generated response.

When LLMs are provided with user prompts that include reference documents that provide grounding for the generated response, the documents may also contain personal data. For example, a prompt to generate a curriculum vitae (CV) in a certain format may contain a copy of an outdated resume, a link to a more recent online bio and a template the LLM is to follow when generating the CV. The LLM can be constrained by well-written prompts to generate an updated CV using the personal information provided and formatted in accordance with the template. In this example, the personal data that the LLM uses will likely be from the sources that have been provided by the user and not from the LLM’s vocabulary. 

Further, the LLM will paraphrase the information in the CV that it generates. The randomness of the predicted text is controlled by adjusting the temperature of the LLM. A higher temperature setting will increase the chance that a lower probability token will be selected as the prediction, thereby increasing the creativity (or randomness) of the generated response. Even at its lowest temperature setting, the LLM may introduce mistakes by paraphrasing job titles and appointments or combining information from different work experiences. These errors occur because the LLM generates text based on learned probabilities rather than factual accuracy. For this reason, it is important to vet and correct generated responses, even if proper grounding has been provided.

A more systematic way of providing grounding is through Retrieval Augmented Generation (RAG) whereby the LLM is deployed in an AI system that includes a trusted source, such as a knowledge management repository. When a query is provided, it is processed by the AI system’s embedding model which converts the entire query into an embedding vector that captures its semantic meaning. This embedding vector is used to conduct a semantic search. This works by identifying embeddings in the vector database (i.e. a database containing document embeddings precomputed from the trusted source) that have the closest proximity (e.g. via Euclidean or cosine distance).19 These distance metrics measure how similar the semantic meanings are. Embeddings that are close together (e.g. nearest neighbour) are said to be semantically similar.20 Semantically similar passages are retrieved from the repository and appended to the prompt that is sent to the LLM for the generation of a response. The AI system may generate multiple responses and select the most relevant one based on either semantic similarity to the query or in accordance with a re-ranking mechanism (e.g. heuristics to improve alignment with intended task).

5. Concluding remarks

LLMs are not designed to store and retrieve information (including personal data). From the foregoing discussion, it may be said that LLMs do not store personal data in the same manner as information storage and retrieval systems. Data stored in the LLM’s vocabulary do not retain the relationships necessary for the retrieval of personal data completely or accurately. The contextual information embedded in the token vectors and neural network reflects patterns in the training corpus. Given how tokens are stored and re-used, the contextual embeddings are not intended to provide the ability to store the relationships between datapoints such that the collection of datapoints is able to describe an identifiable individual.

By acquiring a better understanding of how LLMs store and process data, we are able to design better trust and safety guardrails in the AI systems that they are deployed in. LLMs play an important role in providing fluency during inference, but they are not intended to perform query and retrieval functions. These functions are performed by other components of the AI system, such as the vector database or knowledge management repository in a RAG implementation. 

Knowing this, we can focus our attention on those areas that are most efficacious in preventing the unintended reproduction of personal data in generated responses. During model development, steps may be taken to address the risk of the reproduction of personal data. These are steps for developers who undertake post-training, such as fine tuning and reinforcement learning.

(a) First, technical measures may be applied to the training corpus to remove, minimise, or obfuscate personal data. This reduces the risk of the LLM memorising personal data. 

(b) Second, new techniques like model unlearning may be applied to reduce the influence of specific data points when the trained model generates a response.

When deploying LLMs in AI systems, steps may also be taken to protect personal data. The measures are very dependent on intended use cases of the AI system and the assessed risks. Crucially, these are measures that are within the ken of most deployers of LLMs (by contrast, a very small number of deployers will have the technical wherewithal to modify LLMs directly through post-training). 

(a) First, remove or reduce personal data from trusted sources if personal data is unnecessary for the intended use case. Good data privacy practices such as pseudonymisation and data minimisation should be observed.

(b) Second, if personal data is necessary, store and retrieve them from trusted sources. Use information storage and retrieval systems that are designed to preserve the confidentiality, integrity and accuracy of stored information. Personal data from trusted sources can thus be provided as grounding for prompts to the LLM. 

(c) Third, consider implementing data loss prevention measures in the AI system. For example, prompt filtering reduces the risk of including unauthorised personal data in user prompts. Likewise, output filtering reduces the risk of unintended reproduction of personal data in responses generated by the AI system.

Taking a holistic approach enables deployers to introduce appropriate levels of safeguards to reduce the risks of unintended reproduction of personal data.21

  1. Memorisation is often also known as partial regurgitation, which does not require verbatim reproduction; regurgitation, on the other hand, refers to the phenomenon of LLMs reproducing verbatim excerpts of text from their training data.
    ↩︎
  2. The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work (27 Dec 2023) New York Times; see also, Audrey Hope “NYT v. OpenAI: The Times’s About-Face” (10 April 2024) Harvard Law Review. 
    ↩︎
  3. This paper deals with the processing of text for training LLMs. It does not deal with other types of foundational models, such as multi-model models that can handle text as well as images and audio.
    ↩︎
  4. See, e.g., van Eijk R, Gray S and Smith M, ‘Technologist Roundtable: Key Issues in AI and Data Protection’ (2024) https://fpf.org/wp-content/uploads/2024/12/Post-Event-Summary-and-Takeaways-_-FPF-Roundtable-on-AI-and-Privacy-1-2.pdf (accessed 26 June 2025). ↩︎
  5. Christopher Samiullah, “The Technical User’s Introduction to Large Language Models (LLMs)” https://christophergs.com/blog/intro-to-large-language-models-llms (accessed 3 July 2025).
    ↩︎
  6. LLM model packages contain different components depending on their intended use. Inference models like ChatGPT are optimized for real-time conversation and typically share only the trained weights, tokenizer, and basic configuration files—while keeping proprietary training data, fine-tuning processes, system prompts, and foundation models private. In contrast, open source research models like LLaMA 2 often include comprehensive documentation about training datasets, evaluation metrics, reproducibility details, complete model weights, architecture specifications, and may release their foundation models for further development, though the raw training data itself is rarely distributed due to size and licensing constraints. See, e.g., https://huggingface.co/docs/hub/en/model-cards (accessed 26 June 2025).
    ↩︎
  7. Configuration files are usually stored as readable text files, while parameter files are stored in compressed binary formats to save space and improve processing speed.
    ↩︎
  8. Weights influence the connections between nodes, while biases influence the nodes themselves: “Neural Network Weights: A Comprehensive Guide” https://www.coursera.org/articles/neural-network-weights (accessed 4 July 2025). ↩︎
  9. An LLM that is ready for developers to use for inference is referred to as pre-trained. Developers may deploy the pre-trained LLM as is, or they may undertake further training using their private datasets. An example of such post-training is fine-tuning. ↩︎
  10.  LLMs are made up of the parameter file, runtime script and configuration files which together form a neural network: supra, fn 5 and the discussion in the accompanying main text. ↩︎
  11. While it could pick the token with the highest probability score, this would produce repetitive, deterministic outputs. Instead, modern LLMs typically use techniques like temperature scaling or top-p sampling to introduce controlled randomness, resulting in more diverse and natural responses. ↩︎
  12. Yekun Chai, et al, “Tokenization Falling Short: On Subword Robustness in Large Language Models” arXiv:2406.11687, section 2.1.
    ↩︎
  13. Word-level tokenisation results in a large vocabulary as every word stemming from a root word is treated as a separate word (e.g. consider, considering, consideration). It also has difficulties handling languages that do not use white spaces to establish word boundaries (e.g. Chinese, Korean, Japanese) or languages that use compound words (e.g. German).
    ↩︎
  14.  WordPiece and Byte Pair Encoding are two common techniques used for subword tokenisation.
    ↩︎
  15. To be clear, the LLM learns relationships and not explicit semantics or syntax. ↩︎
  16. Definition of personal data in Singapore’s Personal Data Protection Act 2012, s 2 and UK GDPR, s 4(1). ↩︎
  17. Depending on the information storage and retrieval system used, common data points could be stored as multiple copies (eg XML database) or in a code list (eg, spreadsheet or relational database).
    ↩︎
  18. Note from the editor: This statement should be read primarily within the framework of Singapore’s Personal Data Protection Act.
    ↩︎
  19. Masked language models (eg, BERT) are used for this, as these models are optimised to capture the semantic meaning of words and sentences better (but not textual generation). Masked language models enable semantic searches. ↩︎
  20. The choice of distance metric can affect the results of the search.
    ↩︎
  21. This paper benefited from reviewers who commented on earlier drafts. I wish to thank Pavandip Singh Wasan, Prof Lam Kwok Yan, Dr Ong Chen Hui and Rob van Eijk for their technical insight and very instructive comments; and Ms Chua Ying Hong, Jeffrey Lim and Dr Gabriela Zanfir-Fortuna for their very helpful suggestions. ↩︎

Malaysia Charts Its Digital Course: A Guide to the New Frameworks for Data Protection and AI Ethics

The digital landscape in Malaysia is undergoing a significant transformation. With major amendments to its Personal Data Protection Act (PDPA) taking effect in June 2025, the country is decisively updating its data protection standards to meet the demands of the global digital economy. This modernization effort is complemented by a forward-looking approach to artificial intelligence (AI), marked by the introduction of the National Guidelines on AI Governance & Ethics in September 2024. Together, these initiatives represent a robust attempt to build a trusted and innovative digital ecosystem.

This post will unpack these landmark initiatives. First, we will examine the key amendments to Malaysia’s PDPA, focusing on the new obligations for businesses and how they compare with the European Union (EU)’s General Data Protection Regulation (GDPR) and other regional laws. We will then delve into the National AI Ethics Guidelines, analyzing its core principles and its place within the Association of Southeast Asian Nations (ASEAN) AI governance landscape. By exploring both, it becomes visible that strong data protection serves as a critical foundation for trustworthy AI, a central theme in Malaysia’s digital strategy.
Key takeaways include:

A. Personal Data Protection (Amendment) Act 2024

1. Background

Malaysia was the first ASEAN Member State to enact comprehensive data protection legislation. Its PDPA, which was enacted in June 2010 and came into force in November 2013, set a precedent in the region.

However, for nearly a decade, the PDPA remained largely unchanged. Recognizing the need to keep up with rapid technological advancements and evolving global privacy standards (such as the 2016 enactment of the GDPR), then-Minister for Communications and Multimedia (now Digital Minister) Gobind Singh Deo revealed plans to review the PDPA in October 2018

In February 2020, Malaysia’s Personal Data Protection Department (PDPD) took the first step by issuing a consultation paper proposing to amend the PDPA in 22 areas. Due to delays from the COVID-19 pandemic and subsequent changes in the Malaysian government, a draft bill was only finalized in August 2022, narrowing the focus to five key amendments:

  1. Requiring the appointment of a DPO.
  2. Introducing mandatory data breach notification requirements.
  3. Extending the Security Principle to data processors.
  4. Introducing a right to data portability.
  5. Revising the PDPA’s cross-border data transfer regime.

The amendment process regained momentum following the establishment of a new Digital Ministry in December 2023 as part of a broader cabinet reshuffle.

The resulting Personal Data Protection (Amendment) Act 2024 (Amendment Act) was passed by both houses of Malaysia’s Parliament in July 2024 and was enacted in October 2024. The amendments came into effect in stages:

During this transition period, the PDPD began consultations on seven new guidelines to provide greater clarity on new obligations under the updated PDPA. To date, the PDPD has released guidelines on (1) appointing DPOs; (2) data breach notifications; and (3) cross-border data transfers. It is also developing guidelines on: (1) data portability; (2) DPIAs; (3) Privacy-by-Design (DPbD); and profiling and automated decision-making (ADM). 

2. The amendments align the PDPA more closely with both international and regional data protection standards

The Amendment Act brings the PDPA closer to other influential global frameworks, such as the GDPR. This carries similarities with regulatory efforts by some other ASEAN Member States, including the enactment of GDPR-like laws in Thailand (2019), Indonesia (2022) and to a lesser extent, Vietnam (2023).

It also follows a broader trend of initiatives in the Asia-Pacific (APAC) region to bring longer-established data protection laws closer to international norms. These include extensive amendments to data protection laws in New Zealand (2020), Singapore (2021), and Australia (2024), as well as an ongoing review of Hong Kong’s law, which began in 2020.

One example of how the Amendment Act brings the PDPA closer to globally recognized norms is the replacement of the term “data user” with “data controller.” While this update is primarily cosmetic and does not change the entity’s substantive obligations, it aligns the PDPA’s terminology more closely with that of the GDPR and other similar laws. 

The following subsections discuss in detail the key amendments introduced by the Amendment Act, illustrating their implications and alignment with both regional and international standards.

2.1. Like the GDPR, the amendments define biometric data as sensitive

The Amendment Act classifies “biometric data” as “sensitive personal data.” The Amendment Act’s definition of “biometric data” is, in fact, potentially broader than its counterpart in the GDPR, as the former does not require that the data must allow or confirm the unique identification of that person.

Organizations processing biometric data may need to revise their compliance practices to comply with the more stringent requirements for processing sensitive personal data (such as obtaining express consent prior to processing), unless one of a narrow list of exceptions applies. However, this is unlikely to pose major challenges to organizations whose compliance strategies take the GDPR as the starting point.

2.2. Like other ASEAN data protection laws, the amendments introduce a new requirement to appoint a DPO

The Amendment Act requires data controllers to appoint a DPO, and register the appointment within 21 days of the appointment. If the DPO changes, controllers must also update registration information within 14 days of the change.

Both controllers and processors must also publish the business contact information of their DPO on official websites, in privacy notices, and in security policies and guidelines. This should include a dedicated official business email account, separate from the DPO’s personal and regular business email.

To provide guidance on this new requirement, the PDPD published a Guideline and Circular on the appointment of DPOs (DPO Guideline) in May 2025 that clarifies and in some cases substantially augments the DPO requirements under the amended PDPA.
The DPO Guideline introduces a quantitative threshold for appointing a DPO. Controllers and processors are only required to appoint a DPO if they:

The DPO Guideline also outlines DPOs’ duties. These duties include serving as the primary point of contact for authorities and data subjects, providing compliance advice, conducting impact assessments, and managing data breach incidents. DPOs do not need to be resident in Malaysia but must be easily contactable and proficient in English and the national language (i.e., Bahasa Melayu). A single DPO may be appointed to serve multiple controllers or processors, provided that the DPO is given sufficient resources and is contactable by the organization, the Commissioner, and data subjects.

The DPO Guideline also prescribes skill requirements. A DPO must have knowledge of data protection law and technology, an understanding of the business’s data processing operations, and the ability to promote a data protection culture with integrity. The required skill level depends on the complexity, scale, sensitivity and level of protection required for the data being processed. 
The amendment aligns Malaysia’s PDPA more closely with data protection laws in the Philippines and Singapore (in this regard) than with the GDPR. Specifically, the Philippines and Singapore both require organizations to appoint at least one DPO. Conversely, Indonesia and Thailand adopt the GDPR’s approach in this regard, requiring DPO appointments only for: (1) public authorities; (2) organizations conducting large-scale systematic monitoring, and (3) those processing sensitive data.

2.3. The amendments significantly increase penalties for PDPA breaches but do not introduce revenue-based fines

The Amendment Act allows the Personal Data Protection Commissioner (Commissioner) to impose:

Notably, the increase in the PDPA’s penalty structure was not one of the proposals raised in the PDPD’s initial consultation paper released in 2020. Nevertheless, these enhanced penalties are consistent with (albeit still lower than) those seen in other ASEAN data protection laws that have been enacted or amended since the GDPR came into effect. These amendments also follow the GDPR’s example in increasing the maximum penalty to either a substantial fine (under the GDPR, 20,000,000 EUR) or a percentage of the organization’s revenue (under the GDPR, up to 4% of its total worldwide annual turnover of the preceding financial year). In ASEAN, data protection laws that have been similarly drafted include:

2.4. The amendments extend security obligations to data processors

Though the PDPA has always drawn a distinction between controllers (previously termed “data users”) and processors, prior to the 2024 amendments, it did not subject data processors to the PDPA’s Security Principle. This Principle requires organizations to take practical steps to protect the personal data from any loss, misuse, modification, unauthorized or accidental access or disclosure, alteration or destruction.

As amended, the PDPA now requires data processors to comply with the Security Principle and provide sufficient guarantees to data controllers that data processors have implemented technical and organizational security measures to ensure compliance with the Principle. 

This amendment aligns the PDPA with the GDPR and the majority of other ASEAN data protection laws, which all impose security obligations on data processors.

Following the amendments, the PDPD began consulting on new guidelines outlining security controls to comply with the Security Principle. However, to date, these guidelines do not appear to have been finalized.

2.5. The amendments establish a significant new data portability right for data subjects in Malaysia

The Amendment Act introduces a new Section 43A into the PDPA, which provides data subjects with the right to request that a data controller transmit their personal data to another controller of their choice. The introduction of this data portability right makes Malaysia the fourth ASEAN jurisdiction to introduce such a right in their data protection law (after the Philippines, Singapore and Thailand).

However, this right is not absolute: it is “subject to technical feasibility and data format compatibility.” The PDPD has indicated that it regards this caveat as an exception that recognizes the practical challenges that controllers may face in transferring data between different systems.

However, this apparent exception risks undermining the right if interpreted too broadly. It should be noted that this flexibility in Malaysia’s data portability regime stands in contrast with the regime under the GDPR, which requires controllers to provide the data in a “structured, commonly used, and machine-readable format.”

To implement this new right, the PDPD has initiated consultations on proposals for subordinate regulations and a new set of guidelines. Key proposals under consideration focus on establishing technical standards, defining the scope of applicable data through “whitelists,” setting timelines for compliance, and determining rules for allowable fees.

The introduction of a data portability right into Malaysia’s PDPA carries potentially significant implications for individuals and businesses in Malaysia. For data subjects, this right enhances control over personal data in an increasingly digital environment. From a market perspective, it has the potential to foster competition and innovation by making it easier for individuals to switch service providers. While there are “success stories” of implementation of data portability rights in select sectors in jurisdictions like the United Kingdom and Australia, challenges remain in rolling out these rights across various sectors of the economy. In the APAC region, both Australia and South Korea have faced significant hurdles in this regard. 

As Malaysia embarks on implementing data portability, it may encounter challenges due to the broad scope of its data portability rights (which are at present not limited to specific sectors). This means that businesses in all industries may need to develop effective processes and technologies to manage portability requests securely – a requirement that could lead to increased costs, especially for smaller enterprises.

2.6. The amendments introduce notifiable data breach requirements to the PDPA

Though the PDPA has imposed positive security obligations on controllers since its enactment, it notably lacked requirements for controllers to notify authorities or affected individuals of data breaches. This legislative void has been addressed through the 2024 amendments and the release of the guidelines on data breach notifications (DBN Guideline) in May 2025.

The new Section 12B in the PDPA requires controllers who have reason to believe that a data breach has occurred to notify the PDPD “as soon as practicable” and in any case, within 72 hours. Written reasons must be provided if the notification is not made within the prescribed timeframe.

Additionally, if the breach is likely to result in significant harm to data subjects, controllers must also notify affected data subjects “without unnecessary delay” and no later than 7 days after the initial notification to the PDPD. Failure to comply with the new notification requirements may result in penalties of up to RM 250,000 (approximately US$53,540) and/or up to two years’ imprisonment.

The DBN Guideline clarifies that a breach is likely to result in “significant harm” when there is a risk that the compromised personal data:

Further, the DBN Guideline also states that controllers should maintain records of data breaches in both physical and electronic formats for at least two years; implement adequate data breach management and response plans; and conduct regular training for employees.

Controllers must also contractually obligate processors to promptly notify them if a data breach occurs and to provide all reasonable assistance with data breach obligations.

These requirements, which are not subject to exceptions, will significantly affect organizations processing personal data in Malaysia. Controllers in particular will need to establish effective processes for detecting, investigating, and reporting data breaches. 

Such requirements are already established in most other major ASEAN jurisdictions, including Indonesia, the Philippines, Singapore, Thailand, and Vietnam. While details vary, most jurisdictions require notifications within 72 hours of discovering a breach, with some mandating public disclosure for large-scale incidents.

The PDPA’s provisions on data breach requirements are largely similar to those in the GDPR. In fact, the PDPA’s breach notification provisions are arguably more expansive, as they do not provide an exception (as does the GDPR) for breaches unlikely to result in a risk to the rights and freedoms of natural persons.

2.7. The amendments replace the PDPA’s restrictive former whitelisting data transfers approach with a more flexible cross-border data transfer regime

Prior to the amendments, the PDPA contained a transfer mechanism permitting transfers of personal data to destinations that had been officially whitelisted by a Minister. However, this provision was never implemented, and no jurisdictions were ever whitelisted.

The amendments replaced this with a new provision allowing controllers to transfer personal data to jurisdictions with laws that: (1) are substantially similar to the PDPA; or (2) ensure an equivalent level of protection to the PDPA. This provision shifts responsibility to controllers to evaluate whether the destination jurisdiction meets the above requirements. 

In May 2025, the PDPD issued a guideline clarifying the requirements under this provision. Specifically, the controller must conduct a Transfer Impact Assessment (TIA), evaluating the destination jurisdiction’s personal data protection law against a series of prescribed factors. The TIA is valid for three years but must be reviewed if there are amendments to the destination’s personal data protection laws.

Notably, in adopting this new mechanism, Malaysia appears to have moved away from the GDPR centralized adequacy model, while maintaining other transfer mechanisms interoperable with the GDPR. The former “whitelist” mechanism more closely resembled the “adequacy” mechanism in Article 45 of the GDPR, which makes the EU Commission responsible for determining whether a jurisdiction or international organization provides an adequate level of protection and issuing a so-called “adequacy decision.” Malaysia’s new cross-border data transfer provision is more adaptable but in the absence of strong enforcement by the PDPD may potentially be open to abuse as the proposed criteria for the TIA are high-level and could easily be satisfied by any jurisdiction that has a data protection law “on the books.” 

Notably, the Guideline also introduces new guidance on other existing transfer mechanisms under the PDPA, such as the conditions for valid consent and determining when transfers are “necessary.” Additionally, the Guideline allows the use of binding corporate rules (BCRs) for intra-group transfers, standard contractual clauses (SCCs) for transfers between unrelated parties, and certifications from recognized bodies as evidence of adequate safeguards in the receiving data controller or processor. 

3. Ongoing consultations show Malaysia is preparing for future technological challenges

In March 2025, the PDPD concluded consultations on its DPIA, DPbD, and ADM guidelines. The adoption of these guidelines, though requiring organizations to take on additional responsibilities, reflects Malaysia’s interest in embracing new standards and addressing emerging technological challenges. 

3.1 Malaysia is aligning with regional peers by proposing detailed DPIA requirements

While the amended PDPA does not explicitly mandate DPIAs, the responsibility to conduct them has been introduced through the new DPO Guidelines. To clarify this obligation, the PDPD has also started consultations on a detailed DPIA framework. This move brings Malaysia closer to APAC jurisdictions like the Philippines, Singapore, and South Korea, which already provide detailed guidance on conducting DPIAs.

Under the proposals, a DPIA would be required whenever data processing is likely to result in a “high risk” to data subjects. The draft guidelines propose a two-tier approach to assess this risk, considering both quantitative factors (like the number of data subjects) and qualitative ones (such as data sensitivity). Notably, if a DPIA reveals a high overall risk, organizations may be required to notify the Commissioner of the risk(s) identified and provide other information as required. If passed in their current form, these rules would give Malaysia some of the most stringent DPIA requirements in the APAC region as no other major APAC jurisdictions impose such a proactive notification requirement on all types of controllers.

3.2 Malaysia’s proposed DPbD requirement aligns its laws closer to international standards

To further align with international standards like the GDPR, the PDPD is consulting on draft guidelines on implementing a “Data Protection by Design” (DPbD) approach. While the amended PDPA does not explicitly mandate DPbD, this proposed guideline aims to clarify how organizations can proactively embed the PDPA’s existing Personal Data Protection Principles into their operations.

The proposed approach would require integrating data protection measures throughout the entire lifecycle of a processing activity, from initial design to final decommissioning. Adopting such a guideline would mark a significant shift of Malaysia’s data protection regime from reactive to proactive data protection, helping organizations ensure more effective compliance and better protect the rights of data subjects. However, implementing and encouraging a DPbD approach goes beyond providing guidelines on DPbD. Such guidelines should be complemented by training and educational workshops for DPOs and organizations, as well as incentive schemes such as domestic trust-mark certification, to better familiarize organizations with the notion and benefits of DPbD.

3.3 Proposed guidelines anticipate the impacts of AI and machine learning

Looking ahead to the challenges posed by AI, the PDPD recently concluded a consultation on regulating ADM and profiling. Although the PDPA does not specifically touch on ADM and profiling, the PDPD’s consultation demonstrates an intent to follow in the footsteps of several other major jurisdictions, including the EU, UK, South Korea, and China, that have already implemented requirements in this area.

The Public Consultation Paper highlighted (see, for instance, para 1.2) the growing risk of AI and machine learning being used to infer sensitive information from non-sensitive data for high-impact automated decisions, such as credit scoring. To address this, the PDPD is considering issuing a dedicated ADM and Profiling (ADMP) Guideline. The ADMP Guideline would regulate ADMP if “its use results in legal effects concerning the data subject or significantly affects the data subject”, and would provide a data subject with (subject to exceptions): (a) the right to refuse to be subject to a decision based solely on ADMP which produces legal effects concerning the data subjects or significantly affects the data subject; (b) a right to information on the ADMP being undertaken; and (c) a right to request a human review of the ADMP. 

As consultation on the ADMP Guideline concluded on 19 May 2025, it will be several more months before the ADMP Guideline is expected to be finalized. Nonetheless, this presents another instance of an APAC data protection regulator acting as a de facto (albeit partial) regulator of AI-augmented decision-making. 

B. National Guidelines on AI Governance & Ethics

1. Background

In parallel with the updates to its data protection law, Malaysia has taken strides in AI governance. On 20 September 2024, the Ministry of Science, Technology, and Innovation (MOSTI) released its “National Guidelines on AI Governance & Ethics” (AI Ethics Guidelines, or Guidelines) – a comprehensive voluntary framework for the responsible development and use of AI technologies in Malaysia.

2. At its core, the Guidelines establish seven fundamental principles of AI

The Guidelines were designed for international alignment, explicitly benchmarking their seven core AI principles against a wide range of global standards. Section 4 details this comparison, referencing frameworks from the OECD, UNESCO, the EU, the US, the World Economic Forum, and Japan.

2.1. The Guidelines establish specific roles, responsibilities, and recommended actions for three key stakeholder groups in the AI ecosystem

The Guidelines assign responsibilities across the AI ecosystem. 

2.2. The Guidelines introduce consumer protection principles for AI that could be a precursor to regulatory requirements

While the AI Ethics Guidelines are voluntary and primarily aimed at encouraging stakeholders to reflect on key AI governance issues, certain provisions in the Guidelines may offer insight into how the Malaysian Government is considering potential future regulation of AI.

The Guidelines encourage businesses in Malaysia to prioritize transparency by clearly informing consumers about how AI uses their data and makes decisions. The Guidelines also encourage such businesses to provide consumers with rights concerning automated decisions, which are comparable to those in data protection laws such as the GDPR. These include the rights to information and explanation about such decisions, to object and request human intervention, and have one’s data deleted (i.e., a “right to be forgotten”). 

Part A.2.3 outlines tentative suggestions for the development of future regulations of AI (whether through existing laws or new regulations), while acknowledging that regulation of AI is at an early stage of development. The suggestions include:

Notably, several of these suggestions (such as enhancing user consent and introducing disclosure and accuracy requirements) align with similar proposals in Singapore’s Model AI Governance Framework for Generative AI and ASEAN’s generative AI guidelines, both released in 2024. 

3. Malaysia is the latest in a series of APAC jurisdictions that have released voluntary AI ethics and governance frameworks

Other APAC jurisdictions that have released voluntary AI governance guidelines in recent years include Indonesia (December 2023), Singapore (in 2019, 2020, and 2024), Hong Kong (June 2024), and Australia (October 2024).

Regionally, ASEAN has also issued regional-level guidance for organizations and national governments. These are, specifically, a “Guide on AI Ethics and Governance” (ASEAN AI Guide) in February 2024, and an expanded Guide focusing on generative AI in January 2025.

Malaysia’s AI Ethics Guidelines align with regional trends toward voluntary, principle-based AI governance, yet differ in focus and approach when compared to its neighbours and the broader ASEAN framework. To understand Malaysia’s position within ASEAN, a brief comparison is provided between Malaysia’s Guidelines and: (1) Singapore’s Model AI Governance Framework (Second Edition); (2) Indonesia’s Circular on AI Ethics (Circular), and (2) ASEAN’s AI Guide).

Table 1. Comparison of voluntary AI ethics/governance frameworks in Southeast Asia

C. Looking ahead

Malaysia’s recent developments in data protection and AI governance represent a concerted effort to build a modern and trusted digital regulatory framework. The comprehensive amendments to the PDPA bring the nation’s data protection standards into closer alignment with global benchmarks like the GDPR, while the AI Ethics Guidelines establish a foundation for responsible AI innovation nationally. Viewed together, these are not separate initiatives but two pillars of a cohesive national strategy designed to foster a trusted digital ecosystem and position Malaysia as a competitive player in the region.

For businesses operating in Malaysia, these developments have significant and immediate implications. Organizations should aim to move beyond basic compliance and adopt a strategic approach to data governance. Key actions include:

In closing, two observations may be made. First, these developments – especially the amendments to Malaysia’s PDPA – come as Malaysia sits as ASEAN’s Chair in 2025. They come as the country hopes to position itself as a mature leader in digital innovation and governance in the region, and potentially, to provide a boost just as Malaysia is hoping to conclude negotiations on the ASEAN Digital Economy Framework Agreement under its watch this year. 

Second, it should be recalled that prior to the Amendment Act, regulatory activity on data protection in Malaysia has been on a low ebb. Additionally, the PDPD has thus far not been highly active in regional and international data protection and digital regulation fora. Nevertheless, with the reconstitution of the Ministry of Communications and Multimedia into the Digital Ministry, and the re-formulation of the PDPD into an independent Commissioner’s Office (as shared by Commissioner Nazri at FPF’s Second Japan Privacy Symposium in Tokyo last year), there is an expectation that more engagement can be expected from Malaysia on data protection and AI regulation in the years to come.

Note: The information provided above should not be considered legal advice. For specific legal guidance, kindly consult a qualified lawyer practicing in Malaysia

Understanding Japan’s AI Promotion Act: An “Innovation-First” Blueprint for AI Regulation

The global landscape of artificial intelligence (AI) is being reshaped not only by rapid technological advancement but also by a worldwide push to establish new regulatory regimes. In a landmark move, on May 28, 2025, Japan’s Parliament approved the “Act on the Promotion of Research and Development and the Utilization of AI-Related Technologies” (人工知能関連技術の研究開発及び活用の推進に関する法律案要綱) (AI Promotion Act, or Act), making Japan the second major economy in the Asia-Pacific (APAC) region to enact comprehensive AI legislation. Most provisions of the Act (except Chapters 3 and 4, and Articles 3 and 4 of its Supplementary Provisions) took effect on June 4, 2025, marking a significant transition from Japan’s soft-law, guideline-based approach to AI governance to a formal legislative framework.

This blog post provides an in-depth analysis of Japan’s AI Promotion Act, its strategic objectives, and unique regulatory philosophy. It further develops on our earlier analysis of the Act (during its draft stage), available exclusively for FPF Members in our FPF Members Portal. The post begins by exploring the Act’s core provisions in detail, before placing the Act in a global context by drawing detailed comparisons between the Act and two other pioneering omnibus AI regulations: (1) the European Union (EU)’s AI Act, and South Korea’s Framework Act on AI Development and Establishment of a Foundation for Trustworthiness (AI Framework Act). This comparative analysis of these three regulations reveals three distinct models for AI governance, creating a complex compliance matrix that companies operating in the APAC region will need to navigate going forward.

Part 1: Key Provisions and Structure of the AI Promotion Act

The AI Promotion Act establishes policy drivers to make Japan the world’s “most AI-friendly country” 

The Act’s primary purpose is to establish foundational principles for policies that promote the research, development, and utilization of AI in Japan to foster socio-economic growth.

The Act implements the Japanese government’s ambition, outlined in a 2024 whitepaper, to make Japan the world’s “most AI-friendly country.” The Act is specifically designed to create an environment that encourages investment and experimentation by deliberately avoiding the imposition of stringent rules or penalties that could stifle development.

This initiative is a direct response to low rates of AI adoption and investment in Japan. A summary of the AI Promotion Act from Japan’s Cabinet office highlights that from 2023 to 2024, private AI investment in Japan was a fraction of that seen in other major markets globally (such as the United States, China, and the United Kingdom), with Stanford University’s AI Index Report 2024 putting Japan in 12th place globally for this metric. The Act is, therefore, a strategic intervention intended to reverse these trends by signaling strong government support and creating a predictable, pro-innovation legal environment.

The AI Promotion Act is structured as a “fundamental law” (基本法), establishing high-level principles and national policy direction rather than detailed, prescriptive rules for private actors.

While introducing a basis for binding AI regulation, the Act also builds on Japan’s longstanding “soft law” approach to AI governance, relying on non-binding government guidelines (such as the 2022 Governance Guidelines for the Implementation of AI Principles and 2024 AI Business Operator Guidelines), multi-stakeholder cooperation, and the promotion of voluntary business initiatives over “hard law” regulation. The Act’s architecture therefore embodies the Japanese Government’s broader philosophy of “agile governance” in digital regulation, which posits that in rapidly evolving fields like AI, rigid, ex-ante regulations are likely to quickly become obsolete and may hinder innovation. 

The AI Promotion Act adopts a broad, functional definition of “AI-related technologies.”

The primary goal of the AI Promotion Act (Article 1) is to establish the foundational principles for policies that promote the research, development, and utilization of “AI-related technologies” in Japan. This term refers to technologies that replicate human intellectual capabilities like cognition, inference, and judgment through artificial means, as well as the systems that use them. This non-technical definition appears to be designed for flexibility and longevity. Notably, the law proposes a unique approach to defining the scope of covered AI technologies and does not adopt the OECD definition of an AI system which served as inspiration for that in the EU AI Act. 

The Act provides a legal basis for five fundamental principles to guide AI governance in Japan

Under Article 3 of the Act, these principles include:

  1. Alignment: AI development and use should align with existing national frameworks, including the Basic Act on Science, Technology and Innovation (科学技術・イノベーション基本法), and the Basic Act on Forming a Digital Society (デジタル社会形成基本法).
  1. Promotion: AI should be promoted as a foundational technology for Japan’s economic and social development, with consideration for national security.
  2. Comprehensive advancement: AI promotion should be systematic and interconnected across all stages, from basic research to practical application.
  3. Transparency: Transparency in AI development and use is necessary to prevent misuse and the infringement of citizens’ rights and interests.
  4. International leadership: Japan should actively participate in and lead the formulation of international AI norms and promote international cooperation.

The AI Promotion Act adopts a whole-of-society approach to promoting AI-related technologies

Broadly, the Act assigns high-level responsibilities to five groups of stakeholders:

To fulfill its responsibilities, the National Government is mandated to take several Basic Measures, including:

The Act adopts a cooperative approach to governance and enforcement

The Act’s approach to governance and enforcement diverges significantly from overseas legislative frameworks.

The centerpiece of the new governance structure established under the Act is the establishment of a centralized AI Strategy Headquarters within Japan’s Cabinet. Chaired by the Prime Minister and including all other Cabinet ministers as members, this body ensures a whole-of-government, coordinated approach to AI policy.

The AI Strategy Headquarters’ primary mandate is to formulate and drive the implementation of a comprehensive national Basic AI Plan, which will provide more substantive details on the government’s AI strategy.

The AI Promotion Act contains no explicit penalties, financial or otherwise, for non-compliance with its requirements or, more broadly, for misusing AI. Instead, its enforcement power rests on a unique cooperative and reputational model.

Part 2: A Tale of Three AI Laws – Comparative Analysis of Japan’s AI Promotion Act, the EU’s AI Act, and South Korea’s AI Framework Act

To fully appreciate Japan’s approach, it is useful to compare it with the other two prominent global AI hard law frameworks, the EU AI Act and South Korea’s Framework AI Act.

The EU AI Act is a comprehensive legal framework for AI systems. Officially published on July 12, 2024, it became effective on August 2, 2024, but it is becoming applicable in multiple stages, beginning in January 2025 and trickling down until 2030. Its primary aim is to regulate AI systems placed on the EU market, balancing innovation with ethical considerations and safety. The Act proposes a risk-based approach whereby a few uses of AI systems are prohibited as they are considered to have unacceptable risk to health, safety and fundamental rights; some AI systems are considered “high-risk” and bear most of compliance obligations for their deployers and providers; while others are either low risk, facing mainly transparency obligations, or they are simply outside of the scope of the regulation. The AI Act also has a separate set of rules applying only to General Purpose AI models, with enhanced obligations linked to those that have “systemic risk.” See here for a Primer on the EU AI Act.

South Korea‘s “Framework Act on Artificial Intelligence Development and Establishment of a Foundation for Trustworthiness” (인공지능 발전과 신뢰 기반 조성 등에 관한 기본법), also known as the “AI Framework Act,” was passed on December 26, 2024, and is currently scheduled to take effect on January 22, 2026. 

The stated purpose of the AI Framework Act is to protect citizens’ rights and dignity, improve quality of life, and strengthen national competitiveness. The Act aims to promote the AI industry and technology while simultaneously preventing associated risks, reflecting a balancing act between innovation and regulation. For a more detailed analysis of South Korea’s AI Framework Act, you may read FPF’s earlier blog post here

Like the EU’s AI Act, South Korea’s AI Framework Act adopts a risk-based approach, introducing specific obligations for “high-impact” AI systems utilized in critical sectors such as healthcare, energy, and public services. However, a key difference between the two laws is that South Korea does not include any prohibition of practices or AI systems. It also includes specific provisions for generative AI. Notably, AI systems used solely for national defense or security are expressly excluded from its scope, and most AI systems not classified as “high-impact” are not subject to regulation under the AI Framework Act.

AI Business Operators, encompassing both developers and deployers, are subject to several specific obligations. These include establishing and operating a risk management plan, providing explanations for AI-generated results (within technical limits), implementing user protection measures, and ensuring human oversight for high-impact AI systems. For generative AI, providers are specifically required to notify users that they are interacting with an AI system.

The AI Framework Act establishes a comprehensive governance framework, including a National AI Committee chaired by the President of the country tasked with deliberating on policy, investment, infrastructure, and regulations. The AI Framework Act also establishes other governance institutions, such as the AI Policy Center and AI Safety Research Institute. The Ministry of Science and ICT (MSIT) holds the responsibility for establishing and implementing a Basic AI Plan every three years. The MSIT is also granted significant investigative and enforcement powers, with enforcement measures including corrective orders and fines. The AI Framework Act also includes extraterritorial provisions, extending its reach beyond South Korea.

Commonalities and divergences across jurisdictions

The regulatory philosophies across Japan, South Korea, and the EU present a spectrum of approaches.

Differences are also evident in scope, risk classification, and enforcement severity. Japan’s AI Promotion Act and South Korea’s AI Framework Act are both foundational laws that allocate responsibilities for AI governance within the government and establish a legal basis for future regulation of AI. However, Japan’s AI Promotion Act does not impose any direct obligations on private actors and does not include a “risk” or “high-impact” classification of AI technologies. By contrast, South Korea’s AI Framework Act imposes a range of obligations on “high-impact” and generative AI, without going so far as to prohibit AI practices. The latter also has specific carve-outs for national defense, similar to how the EU AI Act excludes AI systems for military and national security purposes from its scope.

The EU AI Act has the broadest and most detailed scope, categorizing all AI systems into four risk levels, with strict requirements for high-risk and outright prohibitions for unacceptable risk systems, in addition to specific obligations for General Purpose AI (GPAI) models.

In terms of enforcement powers, Japan’s AI Promotion Law notably lacks any penalties for noncompliance or misuse of AI more broadly. South Korea’s AI Framework Act, by contrast, has enforcement powers, including fines and corrective orders, but its financial penalties are comparatively lower than those in the EU’s AI Act. For instance, the maximum fine under South Korea’s AI Framework Act is set at KRW 30 million (approximately USD 21,000), whereas, under the EU AI Act, fines can range from EUR 7.5 million to EUR 35 million (approximately. USD 7.8 million to USD 36.5 million), or 1% to 7% of the company’s global turnover.

Despite these divergences, there are some commonalities. All three laws establish central governmental bodies (Japan’s AI Strategy Headquarters, South Korea’s National AI Committee, and the EU’s AI Office/NCAs) to coordinate AI policy and strategy. All three also emphasize international cooperation and participation in norm-setting. Notably, all three frameworks explicitly or implicitly reference the core tenets of transparency, fairness, accountability, safety, and human-centricity, which have been developed in international forums like the OECD and the G7 Hiroshima AI Process.

The divergence is not in the “what” – ensuring the responsible development and deployment of AI – but in the “how.” The EU chooses comprehensive, prescriptive regulation; Japan opts for softer regulation building on existing voluntary guidelines; and South Korea applies targeted regulation to specific high-risk areas. This indicates a global consensus on the desired ethical outcomes for AI, but a deep and consequential divergence on the most effective legal and administrative tools to achieve them.

Access here a detailed Comparative Table of the three AI laws in the EU, South Korea and Japan, comparing them on 11 criteria, from definitions and scope, to risk categorization, enforcement model and support for innovation.

The future of AI regulation: A new regional and global landscape

The distinctly “light-touch” approach to AI regulation in Japan suggests a minimal compliance burden for organizations in the immediate term. However, the AI Promotion Act is arguably the beginning, not the end, of the conversation, as the forthcoming Basic AI Plan has the potential to introduce a wide range of possible initiatives.

Regionally, Japan’s “innovation-first” strategy likely aims to draw investment by offering a less burdensome regulatory environment. The EU, conversely, is attempting to set a high standard for ethical and safe AI, aiming to foster sustainable and trustworthy innovation. South Korea’s middle-ground approach attempts to capture benefits from both strategies.

The availability of a full spectrum of regulatory models on a global scale aimed at the same technology could lead to regulatory arbitrage. It remains to be seen whether companies prioritize development in less regulated jurisdictions to minimize compliance costs, or, conversely, whether there will be a global demand for “EU-compliant” AI as a mark of trustworthiness. This dynamic implies that the future of AI development might be shaped not just by technological breakthroughs but by the attractiveness of regulatory environments as well.

Nevertheless, it is also worth noting that a jurisdiction’s regulatory model alone does not determine its ultimate success in attracting investments or deploying AI effectively. Many other factors, such as the availability of data, compute and talent, as well as the ease of doing business generally, will also be critical.

With two significant jurisdictions in the APAC region having adopted now innovation-oriented AI laws, it appears that the region is starting a trend in innovation-first AI regulation and a contrasting model to the EU AI Act. At the same time, it is notable that both Japan and South Korea have comprehensive national data protection laws, which offer safeguards to people’s rights in all contexts where personal data is being processed, including through AI systems.

Note: Please note that the summary of the AI Promotion Act below is based on an English machine translation, which may contain inaccuracies. Additionally, the information should not be considered legal advice. For specific legal guidance, kindly consult a qualified lawyer practicing in Japan.

The author acknowledges the valuable contributions of the APAC team’s interns, Darren Ang and James Jerin Akash, in assisting with the initial draft of this blog post.

Meet Bianca-Ioana Marcu, FPF Europe Managing Director

FPF is pleased to welcome our colleague Bianca-Ioana Marcu to her new role as Managing Director of FPF Europe. With extensive experience in privacy and data protection, she takes on this responsibility at a pivotal moment for digital regulation in Europe. In this blog, we will explore her perspectives on the evolving privacy landscape, her approach to advancing discussions on data protection in Europe and Africa, and her vision for strengthening FPF’s leadership in addressing emerging challenges. Her insights will be key in navigating the complex intersection of privacy, innovation, and regulatory development in the years ahead.

You’ve been part of FPF for some time now, but this new role brings fresh responsibilities. What are you most excited to lead as Managing Director of the European office, and how do you see your work promoting the privacy dialogue in the region?

Stepping into this new role at FPF has given me a renewed sense of energy and opportunity that I hope to bring to the brilliant team on the ground. We are at a crossroads in Europe where existential questions are being asked with regard to the effectiveness and malleability of the existing digital regulatory framework. The privacy question is and will remain essential in this ongoing dialogue, as the GDPR is recognized as both the foundation and the cornerstone of the broader EU digital rulebook.

Within the FPF Europe office we will continue to contribute actively to this dialogue, acting as a source of expert, practical, and measured analysis and ideas for identifying ways in which respect for fundamental rights can coexist alongside technological development.

bianca's photo for blog

As you step into the role of Managing Director, you will also continue coordinating FPF’s growing presence in Africa. What are your top three priorities for the coming year?  

With the expert knowledge and support of our Policy Manager for Africa, Mercy King’ori, this year we successfully launched FPF’s Africa Council. The basis for our work in the region is to advance data protection through collaboration, innovation, and regional expertise, focusing on thought leadership and regionally grounded research. We were delighted to be an official partner of the Network of African Data Protection Authorities (NADPA) Conference hosted in Abuja, Nigeria, with an event on securing safe and trustworthy cross-border data flows.

Over the next years, FPF Africa will sustain its support for data practices that drive innovation, protect privacy, and uphold fundamental rights while being rooted in the diverse legal, social, and economic contexts of the continent.

FPF is known as a trusted platform where senior leaders come to test ideas, share solutions, and learn from one another. As Managing Director, how do you plan to strengthen these connections further while supporting members navigating emerging challenges?

Now in my third year of bringing to life FPF’s flagship event in Europe – the Brussels Privacy Symposium – I am continually inspired by the openness and commitment of the senior leaders in our community in ensuring strong data protection practices globally.

Our dedication to delivering high-quality legal research and policy analysis to our members remains strong, as well as opportunities to come together with intellectual curiosity.

Innovation and data protection are often seen at odds. In your view, what are the most promising opportunities for advancing privacy and innovation in the EU?

As the regulatory dialogue in Europe evolves, there is certainly an opportunity for advancing privacy protection as well as for supporting the region’s ambitions for economic growth. The current momentum for European legislators to streamline the EU’s digital rulebook brings promising opportunities for gathering all stakeholders around the same table, with a focus on clarifying legal uncertainties or points of tension between the rulebook’s different elements, and with an eye on the type of future we want to co-design. 

On a more personal note, what inspires your commitment to privacy, and how has your perspective evolved through your work at FPF and beyond?

My commitment to privacy is fueled not only by the belief that the fulfillment of this right is conducive to the enjoyment of other fundamental rights, including non-discrimination, but also by the support and dedication I have found within a privacy community that extends far beyond Brussels. My work at FPF, particularly on Gabriela Zanfir-Fortuna’s brilliant Global Privacy team, has exposed me to the rich and diverse practices and understandings of privacy and data protection around the world. My ambition is to bring this valuable global perspective to FPF Europe’s work, finding ways for continued cooperation and alignment rather than distance and isolationism. 

Brazil’s ANPD Preliminary Study on Generative AI highlights the dual nature of data protection law: balancing rights with technological innovation

Brazil’s Autoridade Nacional de Proteção de Dados (“ANPD”) Technology and Research Unit (“CGTP”) released the preliminary study Inteligência Artificial Generativa (“Preliminary Study on GenAI”, in Portuguese) as part of its Technological Radar series, on November 29, 2024.1 A short English version of the study was also released by the agency in December 2024. This analysis provides information for developers, processing agents, and data subjects on the potential benefits and challenges of generative AI in relation to the processing of personal information under existing data protection rules. 

Although this study does not offer formal legal guidance, it provides important insight into how the ANPD may approach future interpretation of the Lei Geral de Proteção de Dados (“LGPD”), Brazil’s national data protection law. As such, it aligns with a global trend of data protection regulators examining the impact of generative AI on privacy and data protection.2 The study sets up the framework for analyzing data protection legal requirements for Generative AI in the Brazilian context by acknowledging that balancing rights with technological innovation is a foundational principle of the LGPD. 

The analysis further takes into account that processing of personal data occurs during multiple stages in the life cycle of generative AI systems, from development to refinement of models. It addresses the legality of web scraping under the LGPD at the training stage, specifically considering that publicly available personal data falls under the scope of the law. The study proposes “thoughtful pre-processing practices”, such as anonymisation or collecting only necessary data for training. It then emphasizes “transparency” and “necessity” as two core principles of the LGPD that need enhanced attention and tailoring to the unique nature of Generative AI systems, before concluding that this technology should be developed from an “ethical, legal, and socio-technical” perspective if society is going to effectively harness its benefits.   

Balancing Rights with Technological Innovation: An LGPD Commitment 

The study acknowledges the relevance of balancing rights with technological innovation under the Brazilian framework. Article 1 of the LGPD identifies the objective of the law as ensuring the processing of personal data protects the fundamental rights of freedom, privacy, and the free development of personality.3 At the same time, Article 2 of the LGPD recognizes data protection is “grounded” on economic and technological development and innovation. 

The study recognizes that advances in machine learning enable generative AI systems beneficial to key fields, including healthcare, banking, and commerce and highlights three use cases likely to produce valuable benefits for Brazilian society. For instance, the Federal Court of Accounts is implementing “ChatTCU”, a generative model to assist the Court’s legal team in producing, translating, and examining legal texts more efficiently. Munai, a local health tech enterprise, is also developing a virtual assistant that will automate the evaluation, interpretation, and application of hospital protocols and support decision-making in the healthcare sector. Finally, Banco do Brasil is developing a Large Language Model (LLM) to assist employees in providing better customer service experiences. The study also highlights the increasing popularity of commercially available generative AI systems such as OpenAI’s ChatGPT and Google’s Gemini among Brazilian users. 

In this context, the study emphasizes that while generative AI systems can produce multiple benefits, it is necessary to assess their potential for creating new privacy risks and exacerbating existing ones. For the ANPD, “the generative approach is distinct from other artificial intelligence as it possesses the ability to generate content (data) […] which allows the system to learn how to make decisions according to the data uses.”4 In this context, the CGTP identifies three fundamental characteristics of generative AI systems that are relevant in the context of personal data processing:

  1. The need for large volumes of personal and non-personal data for system training purposes;
  2. The capability of inference that allows the generation of new data similar to the training data; and
  3. The adoption of a diverse set of computational techniques, such as the architecture of transformers for natural language processing systems.5 

For instance, the study mentions LLMs as examples of models trained on large volumes of data. LLMs capture semantic and syntactic relationships and are effective at understanding and generating text across different domains. However, they can also generate misleading answers and invent inaccurate “hallucinations.” Another example are foundational models, which are trained on diverse datasets and can perform tasks in multiple domains, often including some for which the model was not explicitly trained. 

The document underscores that the technical characteristics and possibilities of generative AI significantly impact the collection, storage, processing, sharing, and deletion of personal data. Therefore, the study holds, LGPD principles and obligations are relevant for data subjects and processing agents using generative AI systems.

Legality of web scraping, impacted by the fact the LGPD covers publicly accessible personal data 

The study notes that generative AI systems are typically trained with data collected through web scraping. Data scraped from publicly available sources may include identifiable information such as names, addresses, videos, opinions, user preferences, images, or other personal identifiers. Additionally, if there is an absence of thoughtful pre-processing practices in the collection phase (i.e. anonymizing or collecting only necessary data), it can increase the likelihood of including more personal data for training purposes, including sensitive and children’s data.

The document emphasizes that the LGPD covers publicly accessible personal data, and consequently, processors and AI developers must ensure compliance with personal data principles and obligations. Scraping operations that capture personal data must be based on one of the LGPD’s lawful bases for processing (Articles 7 and 11) and comply with data protection principles of good faith, purpose limitation, adequacy, and necessity (Article 7, par. 3). 

Moreover, the study warns that web scraping reduces data subjects’ control over their personal information. According to the CGTP, users generally remain unaware of web scraping involving their information and how developers may use their data to train generative AI systems. In some cases, scraping can result in a data subject’s loss of control over personal information after the user deletes or requests deletion of their data from a website, as prior scraping and data aggregation may have captured the data and made it available in open repositories. 

Allocation of responsibility depends on patterns of data sharing and hallucinations 

The ANPD also takes note of the processing of personal data during several stages in the life cycle of generative AI systems, from development to refinement of models. The study explains that generative AI’s ability to generate synthetic content extends beyond basic processing and encompasses continuous learning and modeling based on the ingested training data. Although the training data may be hidden through mathematical processes during training, the CTGP warns that vulnerabilities to the system, such as model inversion or membership inference attacks, could expose individuals included in training datasets. 

Furthermore, generative AI systems allow users to interact with models using natural language. Depending on the prompt, context, and information provided by the user, these interactions may generate outputs containing personal data about the user or other individuals. A notable challenge, according to the study, is to allocate responsibility in scenarios where i) personal data is generated and shared with third parties, even if a model was not specifically trained for that purpose; and ii) where a model creates a hallucination – false, harmful, or erroneous assumptions about a person’s life, dignity, or reputation, harming the subject’s right to free development of personality.

The study identifies three example scenarios in which personal data sharing can occur in the context of generative AI systems:

  1. Users sharing personal data through prompts 

This type of sharing occurs through the input of prompts by users, which can allow users to share information in diverse formats such as text, audio, and images, all of which may contain personal, confidential, and sensitive data. In some instances, users may not be aware of the risks involved in sharing personal information or, if aware, they might choose to “trust the system” to get the answers and assistance they need. In this scenario, the CGTP points out that safeguards should be developed to create privacy-friendly systems. One way to achieve this is to provide users with clear and easily accessible information about the use of prompts and the processing of personal data by generative AI tools. 

The study highlights that users sharing the personal data of other individuals through prompts may be considered processing agents under the LGPD and consequently be subject to its obligations and sanctioning regime. Nonetheless, the CGTP cautions that transferring responsibility exclusively to users is not enough to safeguard personal data protection or privacy in the context of generative AI.

  1. Sharing AI-generated outputs containing personal data with third parties 

Under this scenario, output or AI-generated content can contain personal data, which could be shared with third parties. The CGTP notes this presents the risk of the personal data being used for secondary purposes unknown to the initial user that the AI developer is unlikely to control. Similar to the previous scenario and data processing activities in general, the study notes the relevance of establishing a “chain of responsibility” among the different agents involved to ensure compliance with the LGPD. 

  1. Sharing pre-trained models containing personal data 

A third scenario is sharing a pre-trained model itself, and consequently, any personal data present in the model. According to the CGTP, “since pre-trained models can be considered a reflection of the database used for training, the popularization of the creation of APIs (Application Programming Interfaces) that adopt foundational models such as pre-trained LLMs, brings a new challenge. Sharing models tends to involve the data that is mathematically present in them”6 (translated from the Portuguese study). Pre-trained models, which contain a reflection of the training data, make it possible to adjust the foundational model for a specific use or domain. 

The CGTP cautions that the possibility of refining a model via the results obtained through prompt interaction may allow for a “continuous cycle of processing” of personal data.7 According to the technical Unit, “the sharing of foundational models that have been trained with personal data, as well as the use of this data for refinement, may involve risks related to data protection depending on the purpose8.”

Relatedly, the document highlights the relevance of the right to delete personal data in the context of generative AI systems. The study emphasizes that the processing of personal data can be present through diverse stages of the AI’s lifecycle, including the generation of synthetic content, through prompt interaction – which allows new data to be shared – and the continuous refinement of the model. In this context, the study points out that this continuous processing of personal data presents significant challenges in (i) delimiting the end of the processing period; (ii) determining whether the purpose of the intended processing was achieved, and (iii) the implications of revoking consent, if the processing relied on this basis. 

Transparency and Necessity Principles: Essential for Responsible Gen-AI under the LGPD

Some LGPD principles have special relevance for the development and use of generative AI systems. The report takes the view that these systems typically lack detailed technical and non-technical information about the processing of personal data. The CGTP warns that this absence of transparency begins in the pre-training phase and extends to the training and refinement of models. The study suggests developers may fail to inform users about how their personal information could be shared under the three scenarios identified above (prompt use, outputs, or foundational models). As a result, individuals are usually unaware their information is used for generative AI training purposes and are not provided with adequate, clear, and accessible information about other processing operations such as sharing their personal information with third parties. 

In this context, the ANPD emphasizes that the transparency principle is especially relevant in the context of the responsible use and development of AI systems. Under the LGPD, this principle requires clear, precise, and easily accessible information about the data processing. The CGTP proposes that the existence and availability of detailed documentation can be a starting point for compliance and can help monitor the development and improvement of generative AI systems. 

Similarly, the necessity principle limits data processing to what is strictly required for developing generative AI systems. Under the LGPD, this principle requires the processing to be the minimum required for the accomplishment of its purposes, encompassing relevant, proportional, and non-excessive data. According to the ANPD, AI developers should be thoughtful about the data to be included in their training datasets and make reasonable efforts to limit the amount and type of information necessary for the purposes to be achieved by the system. Determining how to apply this principle to the creation of multipurpose or general-purpose “foundation models” is an ongoing challenge in the broader data protection space.

Looking Into the Future 

The study concludes that generative AI must be developed from an “ethical, legal, and socio-technical” perspective if society is going to effectively harness its benefits while limiting the risks it poses. The CGTP acknowledges that generative AI may offer solutions in multiple fields and applications, however, society and regulators must be aware that generative AI may also entail new risks or exacerbate existing ones concerning privacy, data protection, and other freedoms. The CGTP highlights that this first report includes preliminary analysis and that further studies in the field are necessary to guarantee adequate protection of personal data, as well as the trustworthiness of the outputs generated by this technology. 


  1. The ANPD’s “Technological Radar” series address “emerging technologies that will impact or are already impacting the national and international scenario of personal data protection” with an emphasis on the Brazilian context. “The purpose of the series is to aggregate relevant information to the debate on data protection in the country, with educational texts accessible to the general public”.  ↩︎
  2. See, for example, Infocomm Media Development Authority, “Model AI Governance Framework for Generative AI” (May 2024); European Data Protection Supervisor, “First EDPS Orientations for ensuring data protection compliance when using Generative AI systems” (June 2024); Commission nationale de l’informatique et des libertés (CNIL), “AI how-to sheets” (June 2024) ; UK’s Information Commissioner’s Office, “Information Commissioner’s Office response to the consultation series on generative AI” (December 2024); European Data Protection Board, “Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models” (December 2024). ↩︎
  3. LGPD Article 1, available at http://www.planalto.gov.br/ccivil_03/_ato2015-2018/2018/lei/L13709compilado.htm. ↩︎
  4. ANPD, Technology Radar, “Generative Artificial Intelligence”, 2024, p. 7. ↩︎
  5. ANPD, Radar Tecnologico, “Inteligência Artificial Generativa”, 2024, pp. 16-17. ↩︎
  6. ANPD, Radar Tecnologico, “Inteligência Artificial Generativa”, 2024, pp. 24-25. ↩︎
  7. Id. ↩︎
  8. Id. ↩︎

Cross-Border Data Flows in Africa: Examining Policy Approaches and Pathways to Regulatory Interoperability

Cross-border data flows are critical to Africa’s digital economy, enabling trade, innovation, and access to continental and global markets. As the drive towards data-driven technologies among businesses and governments grows, the ability to transfer personal data across borders efficiently and securely has become a key policy concern on the continent, a position echoed by the African Union (AU) and its Member States. This Issue Brief provides an overview of the current policy landscape for inter-African cross-border data flows, and proposes possible paths toward regulatory cooperation. 

The Issue Brief begins by highlighting ongoing sub-regional efforts to shape frameworks for cross-border data flows, including through the work by the African Union, the Economic Community of East African States (ECOWAS), the East Africa Community (EAC), and the Southern Africa Development Community (SADC). These efforts show early alignment toward shared standards, but also underline the diversity of legal frameworks and enforcement capacity across jurisdictions.

The Brief introduces a taxonomy of cross-border data regimes in Africa, identifying two common approaches: The first encompasses countries with no cross-border data flows provisions, either because such provisions are omitted from the law or countries lack comprehensive data protection laws in entirety; and the second approach includes countries with restrictions for transferring personal data to other African countries

To operationalize inter-African cross-border data flows, legal frameworks on the continent increasingly reference data transfer tools. The Issue Brief explores the use and implementation of mechanisms such as adequacy decisions, certification mechanisms, standard contractual clauses (SCCs), and binding corporate rules (BCRs) and derogations, currently in use across Kenya, Nigeria, South Africa, Rwanda, and Ivory Coast. This comparative analysis highlights that the practical implementation of transfer tools remains uneven across the continent, and many countries lack clear guidance or infrastructure to support their use. 

In the final section of the Issue Brief, we outline policy considerations and opportunities for convergence on cross-border data flows across the continent, encouraging African countries to work toward interoperable data transfer frameworks that reflect shared values.

Consent for Processing Personal Data in the Age of AI: Key Updates Across Asia-Pacific

This Issue Brief summarizes key developments in data protection laws across the Asia-Pacific region since 2022, when the Future of Privacy Forum (FPF) and the Asian Business Law Institute (ABLI) published a series of reports examining 14 jurisdictions in the region. We found that while many offer alternative legal bases for data processing, consent remains the most widely used, often due to its familiarity, despite known limitations.

This Issue Brief provides an updated view of evolving consent requirements and alternative legal bases for data processing across key APAC jurisdictions: India, Vietnam, Indonesia, the Philippines, South Korea, and Malaysia.

In August 2023, India passed the Digital Personal Data Protection Act (DPDPA). Once in force, the DPDPA will provide a comprehensive framework for processing personal data. It affirms consent as the primary basis for processing but introduces structured obligations around notice, purpose limitation, and consent withdrawal, while enabling future flexibility for alternative legal bases.

Vietnam‘s Decree on Personal Data Protection took effect in July 2023. It sets clearer standards for consent while formally recognizing alternative legal bases, including for contractual necessity and legal obligations. This marks a key step in broadening lawful processing options for businesses.

Indonesia’s Personal Data Protection Law (PDPL), enacted in October 2022, introduces a unified national privacy law with an extended transition period. It affirms consent but also allows processing based on legitimate interest, public duties, and contract performance, bringing Indonesia closer to global privacy frameworks.

In November 2023, the PhilippinesNational Privacy Commission issued a Circular on Consent, clarifying valid consent standards and promoting transparency. The guidance aims to reduce consent fatigue by encouraging layered, contextual consent interfaces and outlines when consent may not be strictly necessary.

South Korea amended PIPA (in force since September 2023) and related guidelines promote easy-to-understand consent practices and recognize additional legal grounds, especially in the context of AI. A 2025 bill is under consideration to expand the use of non-consent bases for AI-related processing.

The Personal Data Protection (Amendment) Act 2024, published in October 2024, introduces stronger enforcement tools and administrative penalties in Malaysia. While the amendments do not change the legal bases for processing, they enhance the compliance environment and signal stricter oversight.

The Issue Brief also explores how the rise of AI is impacting shifts in lawmaking and policymaking across the region, when it comes to lawful grounds for processing personal data. 

As the APAC region shifts from fragmented, sector-specific rules to unified legal frameworks, understanding the evolving role of consent and the growing adoption of alternative legal bases is essential. From improving user-friendly consent mechanisms to strengthening enforcement and expanding lawful processing grounds, these changes highlight a more flexible and accountable approach to data protection across the region.

FPF and OneTrust publish the Updated Guide on Conformity Assessments under the EU AI Act

The Future of Privacy Forum (FPF) and OneTrust have published an updated version of their Conformity Assessments under the EU AI Act: A Step-by-Step Guide, along with an accompanying Infographic. This updated Guide reflects the text of the EU Artificial Intelligence Act (EU AIA), adopted in 2024.  

Conformity Assessments (CAs) play a significant role in the EU AIA’s accountability and compliance framework for high-risk AI systems. The updated Guide and Infographic provide a step-by-step roadmap for organizations seeking to understand whether they must conduct a CA. Both resources are designed to support organizations as they navigate their obligations under the AIA and build internal processes that reflect the Act’s overarching accountability. However, they do not constitute legal advice for any specific compliance situation. 

Key highlights from the Updated Guide and Infographic:

You can also view the previous version of the Conformity Assessment Guide here.

South Korea’s New AI Framework Act: A Balancing Act Between Innovation and Regulation

On 21 January 2025, South Korea became the first jurisdiction in the Asia-Pacific (APAC) region to adopt comprehensive artificial intelligence (AI) legislation. Taking effect on 22 January 2026, the Framework Act on Artificial Intelligence Development and Establishment of a Foundation for Trustworthiness (AI Framework Act or simply, Act) introduces specific obligations for “high-impact” AI systems in critical sectors, including healthcare, energy, and public services, and mandatory labeling requirements for certain applications of generative AI. The Act also includes substantial public support for private sector AI development and innovation through its support for AI data centers, as well as projects that create and provide access to training data, and encouragement of technological standardization to support SMEs and start-ups in fostering AI innovation. 

In the broader context of public policies in South Korea that are designed to allow the advancement of AI, the Act is notable for its layered, transparency-focused approach to regulation, moderate enforcement approach compared to the EU AI Act, and significant public support intended to foster AI innovation and development. We cover these in Parts 2 to 4 below. 

Key features of the law include:

In Part 5, we provide a comparison below to the European Union (EU)’s AI Act (EU AI Act). We note that while the AI Framework Act shares some common elements with the EU AI Act, including tiered classification and transparency mandates, South Korea’s regulatory approach differs in its simplified risk categorization, including absence of prohibited AI practices, comparatively lower financial penalties, and the establishment of initiatives and government bodies aimed at promoting the development and use of AI technologies. The intent of this comparison is to assist practitioners in understanding and analyzing key commonalities and differences between both laws.

Finally, Part 6 of this article places the Act within South Korea’s broader AI innovation strategy and discusses the challenges of regulatory alignment between the Ministry of Science and IT (MSIT) and South Korea’s data protection authority, the Personal Information Protection Commission (PIPC) in South Korea’s evolving AI governance landscape.

1. Background 

On 26 December 2024, South Korea’s National Assembly passed the Framework Act on Artificial Intelligence Development and Establishment of a Foundation for Trustworthiness (AI Framework Act or Act). 

The AI Framework Act was officially promulgated on 21 January 2025 and will take effect on 22 January 2026, following a one-year transition period to prepare for compliance. During this period, MSIT will assist with the issuance of Presidential Decrees and other sub-regulations and guidelines to clarify implementation details.

South Korea was the first country in the Asia-Pacific region to introduce a comprehensive AI law in 2021: the Bill on Fostering Artificial Intelligence and Creating a Foundation of Trust. However, the legislative process faced significant hurdles, including political uncertainty surrounding the April 2024 general elections, raising concerns that the bill could be scrapped entirely.

However, by November 2024, South Korea’s AI policy landscape had grown increasingly complex, with 20 separate AI governance bills since the National Assembly began its new term in June 2024, each independently proposed by different members. In November 2024, the Information and Communication Broadcasting Bill Review Subcommittee conducted a comprehensive review of these AI-related bills and consolidated them into a single framework, leading to the passage of the AI Framework Act.

At its core, the AI Framework Act adopts a risk-based approach to AI regulation. In particular, it introduces specific obligations for high-impact AI systems and generative AI applications. The AI Framework Act also has extraterritorial reach: it applies to AI activities that impact South Korea’s domestic market or users.

This blog post examines the key provisions of the Act, including its scope, regulatory requirements, and implications for organizations developing or deploying AI systems.

2. The Act establishes a layered approach to AI regulation

2.1 Definitions lay the foundation for how different AI systems will be regulated under the Act

Article 2 of the Act provides three AI-related definitions. 

At the core of the Act’s layered approach is its definition of “high-impact AI” (which is subject to more stringent requirements). “High-impact AI” refers to AI systems “that may have a significant impact on or pose a risk to human life, physical safety, and basic rights,” and is utilized in critical sectors identified under the AI Framework Act, including energy, healthcare, nuclear operations, biometric data analysis, public decision-making, education, or other areas that have a significant impact on the safety of human life and body and the protection of basic rights as prescribed by Presidential Decree.

The Act also introduces specific provisions for “generative AI.” The Act defines generative AI as AI systems that create text, sounds, images, videos, or other outputs by imitating the structure and characteristics of the input data. 

The Act also defines an “AI Business Operator” as corporations, organizations, government agencies, or individuals conducting business related to the AI industry. The Act subdivides AI Business Operators into two sub-categories (which effectively reflect a developer-deployer distinction): 

Currently, as will be covered in more detail below, the obligations under the Act apply to both categories of AI Business Operators, regardless of their specific roles in the AI lifecycle. For example, transparency-related obligations apply to all AI Business Operators, regardless of whether they are involved in the development and/or deployment phases of AI systems. It remains to be seen if forthcoming Presidential Decrees to implement the Act will introduce more differentiated obligations for each type of entity.

While the Act expressly excludes AI used solely for national defense and security from its scope, the Act applies to both government agencies and public bodies when they are involved in the development, provision, or use of AI technology in a business-related context. More broadly, the Act also assigns the government a significant role in shaping AI policy, providing support, and overseeing the development and use of AI.

2.2. The AI Framework Act has broad extraterritorial reach 

Under Article 4(1), the Act applies not only to acts conducted within South Korea but also to those conducted abroad that impact South Korea’s domestic market, or users in South Korea. This means that foreign companies providing AI systems or services to users in South Korea will be subject to the Act’s requirements, even if they lack a physical presence in the country. 

However, Article 4(2) of the Act introduces a notable exemption for AI systems developed and deployed exclusively for national defense or security purposes. These systems, which will be designated by Presidential Decree, fall outside the Act’s regulatory framework.

For global organizations, the Act’s jurisdictional scope raises key compliance considerations. Companies will likely need to assess whether their AI activities fall under South Korea’s regulatory reach, particularly if they:

This last criterion appears to be a novel policy proposition and differentiates the AI Framework Act from the EU AI Act, potentially making it broader in reach. This is because it does not seem necessary for an AI system to be placed on the South Korean market for the condition to be triggered, but simply for the AI-related activity of a covered entity to “indirectly impact” the South Korean market. 

2.3. The Act establishes a multi-layered approach to AI safety and trustworthiness requirements

(i) The Act emphasizes oversight of high-impact AI but does not prohibit particular AI uses 

For most AI Business Operators, compliance obligations under the AI Framework Act are minimal. There are, however, noteworthy obligations – relating to transparency, safety, risk management and accountability – that apply to AI Business Operators deploying high-impact AI systems. 

Under Article 33, AI Business Operators providing AI products and services must “review in advance” (this presumably means before the relevant product or service is released into a live environment or goes to market) whether their AI systems is considered “high-impact AI.” Businesses may request confirmation from the MSIT on whether their AI system is to be considered “high-impact AI.”

Under Article 34, organizations that offer high-impact AI, or products or services using high-impact AI, must meet much stricter requirements, including:

1. Establishing and operating a risk management plan.

2. Establishing and operating a plan to provide explanation for AI-generated results within technical limits, including key decision criteria and an overview of training data.

3. Establishing and operating “user protection measures.”

4. Ensuring human oversight and supervision of high-impact AI.

5. Preserving and storing documents that demonstrate measures taken to ensure AI safety and reliability.

6. Following any additional requirements imposed by the National AI Committee (established under the Act) to enhance AI safety and 7. reliability.

Under Article 35, AI Business Operators are also encouraged to conduct impact assessments for high-impact AI systems to evaluate their potential effects on fundamental rights. While the language of the Act (i.e., “shall endeavor to conduct an impact assessment”) suggests that these assessments are not mandatory, the Act introduces an incentive: where a government agency intends to use a product or service using high-impact AI, the agency is to prioritize AI products or services that have undergone impact assessments in public procurement decisions. Legislatively stipulating the use of public procurement processes to incentivize businesses to conduct impact assessments appears to be a relatively novel move and arguably reflects the innovation-risk duality seen across the Act.

(ii) The Act prioritizes user awareness and transparency for generative AI products and services 

The AI Framework Act introduces specific transparency obligations for generative AI providers. Under Article 31(1), AI Business Operators offering high-impact or generative AI-powered products or services must notify users in advance that the product or service utilizes AI. Further, under Article 31(2), AI Business Operators providing generative AI as a product or service must also indicate that output generated was generated by generative AI. 

Beyond general disclosure, Article 31(3) of the Act mandates that where an AI Business Operator uses an AI system to provide virtual sounds, images, video or other content that are “difficult to distinguish from reality,” the AI Business Operator must “notify or display the fact that the result was generated by an (AI) system in a manner that allows users to clearly recognize it.” 

However, the provision also provides flexibility for artistic and creative expressions. It permits notifications or labelling to be displayed in ways intended to not hinder creative expression or appreciation. This approach appears aimed at balancing the creative utility of generative AI with transparency requirements. Technical details, such as how notification or labelling should be implemented, will be prescribed by Presidential Decree.

(iii) The Act establishes other requirements that apply when certain thresholds are met

The following requirements focus on safety measures and operational oversight, including specific provisions for foreign AI providers.

Under Article 32, AI Business Operators that operate AI systems whose computational learning capacity exceeds prescribed thresholds are required to identify, assess, and mitigate risks throughout the AI lifecycle, and establish a risk management system to monitor and respond to AI-related safety incidents. AI Business Operators must document and submit their findings to the MSIT. 

For accountability, Article 36 provides that AI Business Operators without a domestic address or place of business and cross certain user number or revenue thresholds (to be prescribed) must appoint a “domestic representative” with an address or place of business in South Korea. The details of the domestic representative must be provided to the MSIT. 

These domestic representatives take on significant responsibilities, including:

3. The Act grants the MSIT significant investigative and enforcement powers

3.1 The legislation empowers the MSIT with broad authority to investigate potential violations of the Act 

Under Article 40 of the Act, the MSIT is empowered to investigate businesses that it suspects of breaching any of the following requirements under the Act:

When potential breaches are identified, the MSIT may carry out necessary investigations, including the authority to conduct on-site investigations and to compel AI Business Operators to submit relevant data. During these inspections, authorized officials can examine business records, operational documents, and other critical materials, following established administrative investigation protocols.

If violations are confirmed, the MSIT can issue corrective orders, requiring businesses to immediately halt non-compliant practices and implement necessary remediation measures. 

3.2 The Act takes a relatively moderate approach to penalties compared to other global AI regulations 

Under Articles 43 of the Act, administrative fines of up to KRW 30 million (approximately USD 20,707) may be imposed for:

This enforcement structure caps fines at lower amounts than other global AI regulations. 

4. The Act promotes the development of AI technologies through strategic support for data infrastructure and learning resources

The MSIT is responsible for developing comprehensive policies to support the entire lifecycle of AI training data, ensuring that businesses have access to high-quality datasets essential for AI development. To achieve this, the Act mandates government-led initiatives to:

A key initiative under the Act can be found in Article 25, which provides for the promotion of policies to establish and operate AI Data Centers. Under Article 25(2), the South Korean government may provide administrative and financial support to facilitate the construction and operation of data centers. These centers will provide infrastructure for AI model training and development, ensuring that businesses of all sizes – including small and medium-sized enterprises (SMEs) – have access to these resources.

The Act also promotes the advancement and safe use of AI by encouraging technological standardization (Articles 13 and 14), supporting SMEs and start-ups, and fostering AI-driven innovation. It also facilitates international collaboration and market expansion while establishing a framework for AI testing and verification (Articles 13 and 14). Together, these measures aim to strengthen South Korea’s broader AI ecosystem and ensure its responsible development and deployment.

5. Comparing the approaches of South Korea’s AI Framework Act and the EU’s AI Act reveals both convergences and divergences

As South Korea is only the second jurisdiction globally to enact comprehensive national AI regulation, comparing its AI Framework Act with the EU AI Act helps illuminate both its distinctive features and its place in the emerging landscape of global AI governance. As many companies will need to navigate both frameworks, understanding of their similarities and differences is essential for global compliance strategies.

Table 1. Comparison of Key Aspects of the South Korea AI Framework Act and EU AI Act

6. Looking ahead

South Korea’s AI Framework Act is the first omnibus AI regulation in the APAC region., The South Korean model is notable for establishing an alternative approach to AI regulation: one that seeks to balance the promotion of AI innovation, development, and use, along with safeguards for high-impact aspects.

6.1 Though the Act establishes a framework for direct regulation of AI, several critical areas require further definition through Presidential Decree

The areas that are expected to be clarified through Presidential Decree include:

The interpretation and implementation of these provisions will significantly shape compliance expectations, influencing how AI businesses—both domestic and international—navigate the regulatory landscape.

6.2 The Act must also be considered in the context of South Korea’s broader efforts to position the country as a leader in AI innovation 

The first – and arguably most significant – of these efforts is a significant bill recently introduced by members of the National Assembly, which seeks to amend the Personal Information Protection Act (PIPA) by creating a new legal basis for the processing of personal information specifically for the development and use of AI. The bill introduces a new Article 28-12, which would permit the use of personal information beyond its original purpose of collection, specifically for the development and improvement of AI systems. This amendment would allow such processing provided that:

Second, South Korea’s government is also reportedly exploring other legal reforms to its data protection law to facilitate the development of AI. According to PIPC Chairman Haksoo Ko’s recent interview with a global regulatory news outlet, these reforms could potentially include reforming the “legitimate interests” basis for processing personal information under the PIPA.

South Korea’s Minister for Science and ICT Yoo Sang-im has also reportedly urged the National Assembly to swiftly pass a law on the management and use of government-funded research data to advance scientific and technological development in the AI era.

Third, while creating these pathways for innovation, the PIPC has simultaneously been developing mechanisms to provide oversight over AI systems. For instance, the PIPC’s comprehensive policy roadmap for 2025 (Policy Roadmap) announced in January 2025 outlines an ambitious regulatory framework for AI governance and data protection. In particular, the Policy Roadmap envisions the implementation of specialized regulatory and oversight provisions for the use of unmodified personal data in AI development. 

The Policy Roadmap is supplemented by the PIPC’s Work Direction for Investigations in 2025 (Work Direction). Published in January 2025, the Work Direction includes measures intended to provide additional oversight over AI services, including conducting preliminary onsite inspections of AI-powered services, such as AI agents, and reviewing the use of personal information in AI-based legal and human resources services.

A possible instance of this additional emphasis on providing oversight arose in February 2025, when the PIPC announced a temporary suspension of new downloads of the Chinese generative AI application Deepseek over concerns about potential breaches of the PIPA.

Fourth, South Korea is seeking to strengthen the accountability of foreign organizations. The PIPC has expressed its support for a bill amending the PIPA’s domestic representative system for foreign organizations, which was subsequently amended and became effective from April 1, 2025. This amendment bill addresses a significant gap in the current system, which has allowed foreign companies to designate unrelated third parties as their domestic agents in South Korea, often resulting in what one lawmaker described as “formal” compliance without meaningful accountability.

The new requirements would mandate that foreign companies with established business units in South Korea designate those local entities as their representatives, while imposing explicit obligations on foreign headquarters to properly manage and supervise these domestic agents. The bill also establishes sanctions for violations of these requirements, including fines of up to KRW 20 million (approximately USD 14,000). 

Fifth, South Korea is seeking to position itself as a global leader in privacy and AI governance through international cooperation and thought leadership. As South Korea prepares to host the annual Global Privacy Assembly in September 2025 – an event involving participants from 95 countries – the PIPC is positioning itself as a bridge between different regional approaches to data protection and AI governance.

6.3 However, these efforts highlight a persistent challenge to ensure clear alignment between key regulatory authorities in South Korea’s AI governance landscape 

Whilst the MSIT was working to finalize the AI Framework Act, the PIPC, like its counterparts in many other jurisdictions globally, has been assuming a de facto regulatory role for AI applications involving personal data.

However, while the AI Framework Act assigns primary responsibility for AI governance to the MSIT, it does not appear to address or acknowledge the PIPC’s role in the regulatory landscape. This creates a potential situation where two parallel AI regulators – one de jure and the other de facto – will likely continue to operate: the MSIT overseeing general AI system safety and trustworthiness under the AI Framework Act, and the PIPC maintaining its oversight of personal data processing in AI systems under the PIPA.

As a result, organizations developing or deploying AI systems in South Korea may need to navigate compliance requirements from both authorities, particularly when their AI systems process personal data. How this dual regulatory structure evolves and whether a more unified governance approach emerges will be a critical factor in determining the success of South Korea’s ambitious AI strategy in the coming years.

Despite these practical challenges, South Korea’s approach to AI regulation offers a potential governance model for other APAC jurisdictions. Regardless, the success of the Act will ultimately depend on how effectively it balances its dual objectives — fostering AI innovation while ensuring responsible deployment. As AI governance evolves globally, the South Korean experience will provide valuable insights for policymakers, regulators, and industry stakeholders worldwide.

Note: Please note that the summary of the AI Framework Act above is based on an English machine translation, which may contain inaccuracies. Additionally, the information should not be considered legal advice. For specific legal guidance, kindly consult a qualified lawyer practicing in South Korea.

The authors would like to thank Josh Lee Kok Thong, Dominic Paulger, and Vincenzo Tiani for their contributions to this post.