OAIC’s Dual AI Guidelines Set New Standards for Privacy Protection in Australia
On 21 October 2024, the Office of the Australian Privacy Commissioner (OAIC) released two sets of guidelines (collectively, “Guidelines”), one for developing and training generative AI systems and the other one for deploying commercially available “AI products”. This marks a shift in OAIC’s regulatory approach from enforcement-focused oversight to proactive guidance.
The Guidelines establish rigorous requirements under the Privacy Act and its 13 Australian Privacy Principles (APPs), particularly emphasizing accuracy, transparency, and heightened scrutiny of data collection and secondary use. Notably, the Guidelines detail conditions that must be met for lawfully collecting personal information publicly available online for purposes of training generative AI, including through a detailed definition of what “fair” collection means.
This regulatory development aligns with Australia’s broader approach to AI governance, which prioritizes technology-neutral existing laws and voluntary frameworks while reserving mandatory regulations for high-risk applications. However, it may signal increased regulatory scrutiny of AI systems processing personal information going forward.
This blog post summarizes the key aspects of these Guidelines, their relationship to Australia’s existing privacy law, and their implications for organizations developing or deploying AI systems in Australia.
- Background: AI Regulation in Australia and the Role of OAIC
Australia, like many jurisdictions globally, is currently in the process of developing its approach to AI regulation. Following a public consultation on “Safe and Responsible AI in Australia” in 2023, the Australian Government issued an “Interim Response” outlining an approach that seeks to regulate AI primarily through existing, technology-neutral laws and regulations, prioritizing voluntary frameworks and soft law mechanisms, and potentially reserving future mandatory regulations for high-risk areas. This stands in contrast to the European Union’s AI Act, which introduces a comprehensive regulatory framework covering a broader range of AI systems.
While the Australian Government has been giving shape to the country’s overall approach to AI regulation, several Australian regulators, as part of the Digital Platform Regulators (DP-REG) Forum, have been closely following developments in AI technology, co-authoring working papers on large language models (2023) and more recently, multimodal foundation models (2024).
The OAIC issued its first ever guidance on complying with the Privacy Act in the context of AI in a DP-REG working paper on multimodal foundation models released in September 2024. It followed up the next month with two sets of more detailed guidelines that provide practical advice for organizations on complying with the Privacy Act and the APPs in two important contexts:
- The “Guidance on Developing and Training Generative AI Models” (AI Development Guidelines) targets developers and focuses specifically on privacy considerations that may arise from training generative AI models on datasets containing personal information. It identifies obligations regarding the collection and processing of such datasets and highlights specific challenges that may arise from practices like data scraping and obtaining datasets from third parties.
- The “Guidance on Privacy and the Use of Commercially Available AI Products” (AI Product Guidelines) is directed at organizations deploying commercially available AI systems that process personal information, in order to offer products or services internally or externally. It also covers the use of freely accessible AI products, such as AI chatbots.
Both Guidelines are complementary, acknowledging and referring to each other, while addressing distinct phases in the AI lifecycle and different stakeholders within the broader AI ecosystem. However, they are not intended to be comprehensive. Instead, they aim to highlight the key privacy considerations that may arise under the Privacy Act when developing or deploying generative AI systems.
- The Guidelines Recognize Both AI’s Benefits and Significant Privacy Risks
Both Guidelines acknowledge AI’s potential to benefit the Australian economy through improved efficiency and enhanced services. However, they also emphasise that AI technologies’ data-driven nature creates substantial privacy risks that must be managed carefully. Key risks highlighted include:
- Loss of control for individuals over how their personal information is used in AI training datasets.
- Bias and discrimination: Inherent biases in training data can be amplified, leading to discriminatory outcomes.
- Inaccuracies: Outputs of AI systems may be inaccurate and are not always easily explainable, impacting trust and decision-making.
- Re-identification: Aggregation of data from multiple sources increases the risk of individual re-identification.
- Potential for misuse: Generative AI in particular can be misused for malicious purposes, including disinformation, fraud, and creation of harmful content.
- Data breaches: Vast datasets used in training increase the risk and potential impact of data breaches.
To address these risks, both Guidelines emphasize that it is important for organizations to adopt a “Privacy by Design” approach when developing or deploying AI, and conducting Privacy Impact Assessments to identify and mitigate potential privacy impacts throughout the AI product lifecycle.
- The Guidelines Establish Rigorous Accuracy Requirements
Organizations are required under APP 10 to take reasonable steps to ensure personal information is accurate, up-to-date, and complete when collected, and also relevant when used or disclosed.
Both Guidelines emphasize that the accuracy obligation in APP 10 is vital to avoid the risks that may arise when AI systems handle inaccurate personal information, which range from incorrect or unfair decisions, to reputational or even psychological harm.
For AI systems, identifying “reasonable steps” under APP 10 requires organizations to consider:
- the sensitivity of the personal information being processed;
- the organization’s size, resources, and expertise – factors which affect their capacity to implement accuracy measures; and
- the potential consequences of inaccurate processing for individuals, as higher risks of harm necessitate more robust safeguards.
The Guidelines emphasize that generative AI models in particular present distinct challenges under APP 10 because they are trained on massive internet-sourced datasets that may contain inaccuracies, biases, and outdated information which can be perpetuated in their outputs. The probabilistic nature of these models also makes them prone to generating plausible but factually incorrect information, and their accuracy can deteriorate over time as they encounter new data or their training data becomes outdated.
To address these challenges, the Guidelines recommend that organizations should implement comprehensive measures, including thorough testing with diverse datasets, robust data quality management, human oversight of AI outputs, and regular monitoring and auditing. The key theme is that organizations must take proactive steps to ensure accuracy throughout the AI system’s lifecycle, with the stringency of measures proportional to the system’s intended use and potential risks.
- The Guidelines Make Transparency a Core Obligation Throughout the AI System Lifecycle
The OAIC’s guidelines also establish transparency as a fundamental obligation throughout the lifecycle of an AI system. Notably, however, the guidelines see transparency as an obligation that operates on multiple levels.
The transparency obligation is rooted in APP 1, which requires organizations to manage personal information openly and transparently (including by publishing a privacy policy), and APP 5, which requires organizations to notify individuals about how their personal information is collected, used, and disclosed.
The Guidelines emphasize that in an AI context, privacy policies must provide clear explanations of how AI systems process personal information and make decisions. When AI systems collect or generate personal information, organizations must give timely and specific notifications that provide individuals genuine insight into how their information is processed and empower them to understand AI-related decisions that affect them.
To support this transparency framework, organizations must invest in comprehensive staff training to ensure employees understand both the technical aspects and privacy implications of their AI systems, enabling them to serve as knowledgeable intermediaries between complex AI technologies and affected individuals. This human oversight is to be complemented by regular audits and monitoring, which help organizations maintain visibility into their AI systems’ performance, address privacy issues proactively, and generate the information needed to maintain meaningful transparency with individuals.
- The Guidelines Place Heightened Scrutiny on Data Collection and Secondary Use
The Guidelines underscore the need for heightened scrutiny on data collection practices under APP 3 and the secondary use of personal information under APP 6 in the AI context. The Guidelines also emphasize that organizations may face distinct challenges across different collection methods.
With regard to challenges in data collection methods, the AI Developer Guidelines highlight that the collection of training datasets that may contain personal information through web scraping – defined as “the automated extraction of data from the web” – raises several concerns under APP 3.
Notably, the Guidelines caution that developers should not automatically assume that information posted publicly can be used to train AI models. Rather, developers must ensure that they comply with APP 3 by demonstrating that:
- It would be unreasonable or impracticable to collect the personal information directly from the individuals concerned;
- The collection of personal information through web scraping is lawful and fair. Noting that collection of personal information via web scraping is often done without the direct knowledge of data subjects, the Guidelines identify 6 factors to consider in determining whether such collection is fair:
- Individuals’ reasonable expectations;
- The sensitivity of the information;
- The intended purpose of the collection, including the intended operation of the AI model;
- The risk of harm to individuals;
- Whether the individuals concerned intentionally made the information public; and
- The steps the developer will take to prevent privacy impacts, including deletion, de-identification, and mechanisms to increase individuals’ control over how their information is processed; and
- Insofar as the dataset contains “sensitive information” (as defined under Australia’s Privacy Act), individuals have provided express consent for this information to be used to train an AI model.
The Guidelines therefore do not prohibit the collection of training data through web scraping, but they lay out detailed requirements that must be fulfilled to lawfully do so. Notably, the Guidelines define what “fair” collection of personal data through web scraping requires, bringing forward several dimensions to consider, from individuals’ perception of the collection and attitude when making the information public, to intrinsic characteristics of the information collected, to extrinsic assessments of risks of harm, to technical and organizational measures that are privacy-enhancing. The Guidelines acknowledge that organizations may face significant challenges in meeting many of these requirements.
Further, the Guidelines note that many of the above considerations under APP 3 also apply to third-party datasets. The Guidelines therefore recommend that organizations seeking to rely on such datasets conduct thorough due diligence regarding data provenance and the original circumstances in which the information was collected.
By contrast, when organizations seek to use their existing datasets to train AI models, the main consideration under the Guidelines is complying with APP 6, which governs secondary use of personal information. This principle requires organizations to either obtain informed consent or carefully evaluate whether AI training aligns with individuals’ reasonable expectations based on the original collection purpose.
Throughout all methods, organizations must adhere to the principle of data minimization, limiting collection of personal information to what is strictly necessary, and must also consider techniques like de-identification or the use of synthetic data to further reduce risks to individuals.
- The AI Product Guidelines Require Organizations to Pay Attention to Privacy Throughout the Deployment Lifecycle
The AI Product Guidelines advocates for a “privacy by design” approach that integrates privacy considerations throughout the AI product lifecycle.
They specifically call on organizations to conduct thorough due diligence before adopting AI products. Recommended steps include assessing the appropriateness of these products for their intended use, evaluating the quality of training data, understanding security risks, and analyzing data flows to identify parties that can access inputted information.
In the deployment and use phase, organizations must exercise strict caution when inputting personal information into AI systems, particularly systems that are provided to the public for free, such as AI chatbots. They emphasize the need to comply with APP 6 for any secondary use of personal information, minimizing data input, and maintaining transparency with individuals about how their information will be used.
While the AI Product Guidelines primarily focus on APPs 1, 3, 5, 6, and 10, they also emphasize that several other APPs may play crucial roles, depending on how the AI product is being used. These APPs include:
- APP 8, which governs cross-border data transfers when AI systems process information on overseas servers;
- APP 11, which requires reasonable security measures to protect personal information in AI systems from unauthorized access and misuse; and
- APPs 12 and 13, which ensure individuals can access and correct their personal information, respectively.
- Looking Ahead: The Guidelines Signal Increased Privacy Scrutiny for AI
The OAIC’s guidelines represent a significant step in regulating AI use in Australia that not only aligns with broader Australian government initiatives, such as the Voluntary AI Safety Standard, but also reflects a broader global trend of data protection authorities issuing rules and guidance on AI governance through existing privacy laws.
The OAIC’s guidelines establish a foundation for privacy-protective AI development and deployment, but organizations must remain vigilant as both the technology and regulatory requirements continue to develop. The release of the Guidelines may hint at increased regulatory scrutiny of AI systems that process personal information, meaning that organizations that develop or deploy such systems will need to carefully consider their obligations under the Privacy Act and implement appropriate safeguards.