Unveiling China’s Generative AI Regulation
Authors: Yirong Sun and Jingxian Zeng
The following is a guest post to the FPF blog by Yirong Sun, research fellow at the New York University School of Law Guarini Institute for Global Legal Studies at NYU School of Law: Global Law & Tech and Jingxian Zeng, research fellow at the University of Hong Kong Philip K. H. Wong Centre for Chinese Law. The guest blog reflects the opinion of the authors only. Guest blog posts do not necessarily reflect the views of FPF.
The Draft Measures for the Management of Generative AI Services (the “Draft Measures”) were released on April 11, 2023, with their comment period closed on May 10. Public statements by industry participants and legal experts provided insight into the likely content of their comments. It is now the turn of China’s cyber super-regulator – the Cyberspace Administration of China (“CAC”) – to consider these comments and likely produce a revised text.
This blog analyzes the provisions and implications of the Draft Measures. It covers the Draft Measures’ scope of application, how they apply to the development and deployment lifecycle of generative AI systems, and how they deal with the ability of generative AI systems to “hallucinate” (that is, produce inaccurate or baseless output). It also highlights potential developments and contextual points about the Draft Measures that industry and observers should pay attention to.
The Draft Measures aim to protect the “collective” interests of “the public” within the territory of the People’s Republic of China (PRC) in relation to the Management of Generative AI Services. The primary risk foreseen by the CAC involves the potential use of the novel technology to manipulate public opinion and fuel social mobilization by spreading sensitive or false information. The Draft Measures also seek to tackle issues arising from high-profile societal events, such as data leaks, frauds, privacy breaches, intellectual property infringements, as well as overseas incidents widely reported in Chinese media, including defamation and extreme cases of suicide following interactions with AI chatbots. Notably, the Draft Measures set high standards for data authenticity and impose safeguards for personal information and user input. They also mandate the disclosure of information that may impact users’ trust and the provision of guidance for using the service rationally.
Meanwhile, concerns have arisen that the Draft Measures may slow down the development of generative AI-based products and services by Chinese tech giants. Companies providing services based on generative AI, including those provided through application programming interfaces (“APIs”), are all subject to stringent requirements in the Draft Measures. The Draft Measures thus concern not only those who have the means to train their own models, but also smaller businesses who want to leverage on open-source pre-trained models to deliver services. In this regard, the Draft Measures are likely to present compliance challenges within the open-source context.
While this blog focuses on the Draft Measures, it is important to note that industrial policies from both central and local governments in China also exert substantial influence over the sector. Critically, the task to promote AI advancement amid escalating concerns is overseen by authorities other than the CAC, such as the Ministry of Science and Technology (“MST”) and the Ministry of Industry and Information Technology (“MIIT”). Recently, the China Academy of Information and Communications Technology (“CAICT”), a research institute affiliated with the MIIT, introduced China’s first-ever industry standards1 for assessing generative AI products. These agencies, along with their competition and coordination, can and will co-play a significant role with the CAC in the realm of generative AI regulation.
1. Notable aspects of the Draft Measures’ scope of application: Definition of “public” and extraterritorial application
Ambiguity in the definition of “public”
The Draft Measures regulate all generative AI-based services offered to “the public within the PRC territory.”2 This scope of application diverges from existing Chinese laws and regulations where intended service recipients are not usually considered. For instance, regulations targeting deep synthesis and recommendation algorithms both apply to the provision of service using these technologies regardless of service recipients being individuals, businesses or “the public.” Looking at its context, Article 6 of the Draft Measures suggests that generative AI-based services have the potential to shape public opinion or stimulate social mobilization, essentially highlighting their impact on “the public.” This new development thus likely signifies the CAC’s goal to prioritize the protection of wider societal interests over individual ones such as privacy or intellectual property which could be protected under previous regulations.
However, the Draft Measures leave “the public (公众)” undefined. This gives rise to ambiguity as to the scope of application for the Draft Measures. For example, would a service licensed exclusively to a Chinese private entity for in-house use fall in the scope? How about a service accessible only to certain public institutes but not to the unaffiliated, or one customized for individual clients who each receive a unique product derived from a common foundation model, or simply an open-source model that is ready to download and install?
Extraterritorial application
The new approach also suggests a more extensive extraterritorial reach. Regardless of where the service is provided, as long as the public within the PRC territory has access to it, the Draft Measures apply. To avoid being subject to Chinese law, OpenAI, for example, has reportedly begun blocking users based in mainland China. This development could further restrict Chinese users’ access to overseas generative AI services, especially since even before the Draft Measures were released, most Chinese users’ access to such services was already geo-blocked – either by the service providers themselves (e.g., by requiring a foreign telephone number for registration), or by the Chinese government through enforcement measures. At the same time, the scale of China’s user market and its involvement in AI development render it a “vital” jurisdiction in terms of AI regulation. OpenAI CEO has recently called for collaboration with China to counter AI risks, a trend we might see more in the future.
2. The Draft Measures adopt a compliance approach based on the lifecycle of generative AI systems
The Draft Measures are targeted at “providers” of generative AI-based services
The Draft Measures take the approach of regulating generative AI-based service providers. As per Article 5, “providers (提供者)” are those “using generative AI to offer services such as chat, text, image, audio generation; including providing programmable interface and other means which support others to themselves generate text, images, audio, etc.” The obligations are as follows:
- Model Training
- Pretraining and optimization:3 Providers must ensure the legality of the sources of data used for pretraining and optimization of generative AI products (Article 7). Existing laws and regulations, such as the Intellectual Property Law and the Personal Information Protection Law (PIPL), are thus extended to cover this new field.
- Human annotation (if any): Providers must establish necessary annotation rules, provide training for annotation personnel, and conduct spot checks to verify the validity of annotation content (Article 8).
- Pre-Launch
- Security assessment and filing: Providers must submit a security assessment to the CAC and register the algorithms they use (Article 6). The CAC has been developing similar filing systems for recommendation algorithms and is likely to draw upon established practices for generative AI.
- Disclosure requirement: Providers shall provide essential information that may impact user trust or decision-making, including descriptions of pre-training and optimization training data, human annotation, as well as foundational algorithms and technological systems (Article 17).
- Service Delivery
- Traceability: Providers must label generated images, videos, and other content in accordance with regulations on deep synthesis (Article 16).
- User guidance: Providers shall guide users to scientifically understand generative AI services and to use generated content rationally and legally (Article 18).
- User accountability: Providers shall take necessary measures against users who misuse generative AI products in ways that violate laws, regulations, ethics, or social norms (Article 19). They also need to require users to provide real identity information in accordance with the Cybersecurity Law (Article 9).
- Report mechanism: Providers shall establish a mechanism for receiving and handling user complaints (Article 13). Users also have the right to report directly to the authorities if they discover noncompliant generated content (Article 18).
- Post-Launch
- Non-compliant content: Providers must take down noncompliant content using methods like filtering and must prevent repeated generation through techniques such as optimization training within three months (Article 15).
- Content producer: Providers bear responsibility as the producer of the content generated by the product (Article 5).
Incentivizing providers to allocate risk upstream to developers
By imposing lifecycle compliance obligations on the end-providers, the Draft Measures create incentives for end-providers to allocate risks to upstream developers through mechanisms like contracts. Whether the parties can distribute their rights and obligations fairly and efficiently depends on various factors, such as the resources available to them and the presence of asymmetric information among them. To better direct this “private ordering” with significant social implications, the EU has planned to create non-binding standard contractual clauses based on each party’s level of control in the AI value chain. The CAC’s stance in this new and fast-moving area remains to be seen.
The Draft Measures pose potential challenges for deploying open-source generative AI systems
Open-source models raise a related but distinct issue. Open-source communities are currently developing highly capable large language models (“LLMs”), and businesses have compelling commercial incentives to adopt them, as training a model from scratch is relatively hard. However, many open-source models are released without a full disclosure of their training datasets, due to reasons such as the extensive effort required for data cleaning and privacy issues, especially when user data is involved. Adding to this complexity is the fact that open-source LLMs are not typically trained in isolation. Rather, they form a modification chain where the models build on top of each other with modifications made by different contributors. Consequently, for those using open-source models, several obligations in the Draft Measures become difficult or even impossible to fulfill, including pre-launch assessment, post-launch retraining, and information disclosure.
3. The Draft Measures target the “hallucination” of generative AI systems
The Draft Measures describe generative AI as “technologies generating text, image, audio, video, code, or other such content based on algorithms, models, or rules.” In contrast to the EU’s new compromise text on rules for generative AI, which adopts a technical definition of “foundation models,” the Draft Measures focus on the technology’s function, regardless of their underlying mechanisms. Moreover, according to Article 6 of the Draft Measures, generative AI-based services automatically fall under the scope of Regulations for the Security Assessment of Internet Information Services Having Public Opinion Properties or Social Mobilization Capacity, which mandate security assessment. A group of seven Chinese scholars have proposed removing this provision and applying security assessment only to those that actually possess these properties.
The Draft Measures contain provisions targeted at ensuring accuracy throughout the developmental lifecycle of generative AI systems. These echo the CAC’s primary concern that the technology could be misused to generate and disseminate misinformation. Article 7(4) of the Draft Measures stipulates that providers must guarantee the “veracity, accuracy, objectivity, and diversity” of the training data. Article 4(4) of the Draft Measures requires that all content generated be “true and accurate,” and that providers of generative AI-based products and services must adopt measures in place to “prevent the generation of false information.” Such providers are responsible for filtering out any non-compliant material and preventing its regeneration within three months (Article 15). However, industry representatives and legal practitioners in China have raised concerns about the baseline and technical feasibility of ensuring data authenticity, given the use of open internet information and synthetic data in the development of generative AI.
4. Looking Ahead
The CAC is expected to refine the Draft Measures after gathering public feedback. The final version and subsequent promulgation may be influenced by a broader set of contextual factors. We believe the following aspects also warrant consideration:
- Risk-specific digital regulation framework The Draft Measures cannot be fully understood on its own and by its text. It takes its shape from existing laws and regulations with risk-specific concerns in the context of mainland China. As mentioned, the CAC has already targeted recommendation algorithms and deep synthesis, which too owe their existence to high-profile societal events involving algorithmic abuses, adolescent Internet addiction, as well as deepfake-related fraud, fake news, and data misuse that sparked widespread consternation. The Draft Measures also rest on upper-level Cybersecurity Law, Data Security Law, PIPL and the measures that directly implement them.
- Dynamic interplay of political, economic, and social factors The implementation and enforcement of the Draft Measures will be deeply influenced by strategies, plans, and policies in a broader context. Most are dedicated to promoting the AI industry. Even though China has an 18-month crackdown on its Big Tech, we shouldn’t forget that these very same “national champions” were encouraged to grow and flourish in the first place. A supportive and nurturing regulatory environment was provided domestically to boost their global competitiveness. Besides, it might be more accurate to view the crackdown as resteering, rather than barring, China’s technology sector growth. It redirects the industry towards a path that the country’s policy makers view as healthier, more sustainable – emphasizing independent and secure supply chains, fostering startups, and encouraging significant breakthroughs in areas such as foundational AI frameworks and models.
- Multifaceted interaction between different jurisdictions The regulation of generative AI is a global issue, with many shared concerns and demands across various countries. China interacts with other major jurisdictions, and China’s policy discussions on AI regulation often draw comparisons with regulations in jurisdictions like the EU and the US. However, the degree to which learning occurs remains unclear, as China’s approach is also molded by contextual elements and considerations, as well as the dual forces of competition and coordination between nations. For these reasons, relationships among AI regulations in different jurisdictions defy simplistic categorization.
1Chinese major players in the AI industry are forming interest groups to channel their influence on policy makers. For example, China’s industry standards for generative AI were drafted by over 40 entities including tech companies such as Baidu, SenseTime, Xiaomi, NetEase. SenseTime also launched an open platform for AI safety governance to shape practices around AI regulatory issues such as cybersecurity, traceability, IP protection.
2A widely circulated translation of Article 2 states: “These Measures apply to the research, development, and use of products with generative AI functions, and to the provision of services to the public within the territory of the People’s Republic of China.” However, we believe this is misleading. A more accurate read of the original Chinese text and its context suggest that “the provision of services to the public” is a cumulative requirement rather than a separate one.
3The Draft Measures seem to exhibit technical sophistication in their terminology. In Articles 7 and 17, the data compliance obligation is split into two phases – pre-training and optimization. However, the choice of terminology is peculiar, as the prevailing terms in machine learning are pre-training and fine-tuning. Optimization is typically employed to describe a stage within the training process, often used in conjunction with forward and backward propagation.