Let’s Look at LLMs: Understanding Data Flows and Risks in the Workplace
Over the last few months, we have seen generative AI systems and Large Language Models (LLMs), like OpenAI’s ChatGPT, Google Bard, Stable Diffusion, and Dall-E, send shockwaves throughout society. Companies are racing to bake AI features into existing products and roll out new services. Many Americans are worrying whether generative AI and LLMs are going to replace them in the workforce, and teachers are downloading ChatGPT specific software to ensure their students are not plagiarizing homework assignments. Some have called for a pause to AI development. But organizations and individuals are adopting LLMs more quickly, and the trend shows no signs of abating.
Organizations have quickly seen employees using generative AI and LLM tools in their workstreams. Few workers are waiting for permission to use the technologies to speed up complex tasks, get tailored answers to full-sentence questions, or draft content like marketing emails. However, the growing use of LLMs creates risks, such as privacy concerns, content inaccuracies, and potential discrimination. Use of LLMs can also be deemed inappropriate in certain contexts and create discontent–students recently criticized their university for lacking empathy when the school used ChatGPT to draft an email notice about a nearby mass shooting.
As organizations navigate these uncertainties, they are asking whether, or when, employees should be permitted to use LLMs for their work activities. Many organizations are establishing or considering internal policies and guidelines for when employees should be encouraged, permitted, discouraged, or prohibited from using such tools. As organizations create new policies, they should be aware that:
1. When workers share personal information with LLMs, it can create legal obligations for data protection and privacy, including regulatory compliance;
2. Many organizations will need to establish norms for originality and ownership, including when it is appropriate for employees to use LLMs or other generative AI systems to create novel content;
3. Organizations need to carefully evaluate any uses for potential bias, discrimination, and misinformation while also considering other potential ethical concerns.
What are LLMs?
In late 2022, OpenAI released its AI chatbot, ChatGPT, which has now undergone several versions and is available as ChatGPT-4. ChatGPT is both a generative AI and a large language model (LLM), which are two distinct but similar AI terms. A “generative AI system” is a type of AI that has been trained on data so that it can produce or generate content similar to what it has been trained on, such as new text, images, video, music, and audio. For example, a growing number of AI tools are available for generating artwork and some even create videos based on text. A “large language model” (LLM) is a type of generative AI that generates text. LLMs can perform a variety of language-related tasks, including translations, text summarization, question-answering, sentiment analysis, and more. They can also generate text that mimics human writing and speech, which have been used in various applications such as chatbots and virtual assistants. To produce human-like responses, LLMs are trained on vast quantities of text data. In ChatGPT’s case, it was trained on vast amounts of data from the internet.
#1. Legal Obligations.
a. Data protection and privacy.
In general, users of generative AI and LLMs should avoid inputting personal or sensitive information into ChatGPT or similar AI tools. ChatGPT uses the data that is input by many users to generate individual responses and further train its model for future responses to all users. If an employee inputs data that contains confidential information, such as trade secrets, medical records, or financial data, then that data may be at risk, especially if there is a data breach of the AI system that results in authorized access or disclosure. Similarly, if an individual puts personally identifiable information or sensitive data into the model, and that data is not properly protected, it could also be improperly accessed or used by unauthorized individuals.
Furthermore, personal information disclosed to LLMs could be used in additional ways that violate the expectations of the people to whom the information relates. For example, ChatGPT also continues to use and analyze this data. Thus, the sensitive information that an employee inputs about a customer or patient could potentially be revealed to another user if they pose a similar question or prompt. Further, every engagement with ChatGPT has a unique identifier–there is a login trail of people who are using it. Therefore, an individual’s use of ChatGPT is not truly anonymous, raising questions about the retention of sensitive data by OpenAI.
b. Regulatory Compliance.
LLMs are subject to the same regulatory and compliance frameworks as other AI technologies, but as LLMs become more common, they can raise novel questions of how such tools can be used in ways that comply with the General Data Protection Regulation (GDPR) and other regulations. Since ChatGPT processes user data to generate responses, OpenAI or the entities relying on ChatGPT for their own purposes may be considered data controllers under the GDPR, which means they should secure a lawful ground to process users’ personal data (such as users’ consent) and users must be informed about the controller’s ChatGPT-powered data processing activities.
OpenAI and companies relying on ChatGPT’s capabilities also need to consider how overarching GDPR principles like data minimization and fairness curtail certain data processing activities, including decisions made on the basis of algorithmic recommendations. Additionally, under the GDPR, data subjects have certain rights regarding their personal data, including the right to access, rectify, and delete their data. But can users really exercise these rights in practice? In regard to data erasure, OpenAI offers users the ability to delete their accounts, but OpenAI has stated that conversations with ChatGPT can be used for AI training. This presents challenges because while it seems that the original input can be deleted, the input can be used to shape and overall improve ChatGPT. Removing a user’s complete digital footprint and its effects on ChatGPT may be unfeasible and risks offending the GDPR’s “right to be forgotten.” Moreover, the repurposing of prompts to train OpenAI’s algorithm may raise issues relating to the GDPR’s purpose limitation principle, as well as the applicable lawful ground for service improvement after a recent restrictive binding decision from the European Data Protection Board (EDPB) concerning the lawfulness of processing for service improvement purposes.
There are questions about the future regulation of ChatGPT and similar technologies under the European Union (EU) Artificial Intelligence Act (AI Act), which is currently under review by the European Parliament. Proposed in 2021, the regulation is designed to ban certain AI uses such as social scoring, manipulation and some instances of facial recognition. However, recent developments regarding LLMs and related AI services have caused the European Parliament to reassess “high-risk” use cases and how to implement proper safeguards that were not previously accounted for in today’s rapidly developing tech environment. EU lawmakers have proposed different ways of regulating general purpose and generative AI systems during their discussions on the text of the AI Act. The consensus at the EU Council is that the European Commission should regulate such systems at a later stage via so-called ‘implementing acts’; at the European Parliament, lawmakers may include such systems in the AI Act’s high-risk list, therefore subjecting them to strict conformity assessment procedures before they are placed on the market.
#2. Ownership and Originality.
Depending on context, organizations should determine when it is appropriate or ethical for individuals or organizations to use, or take credit for, work generated by LLMs. In education, for example, some teachers are adapting their approaches (e.g., from written to oral presentations) to avoid plagiarism. Schools have even banned ChatGPT due to concerns that the technology will cause students to take shortcuts when writing or forgo doing their own research.
In many cases, these issues raise novel questions about legal rights and liability. For example, software developers have used ChatGPT to write and improve existing code. Yet LLMs, including ChatGPT, have been shown to regularly produce inaccurate content. If an employee uses the code generated by ChatGPT in a product that interacts with the public, organizations may have liability if something goes wrong. In a related issue, if employees input copyrighted, patented, or confidential information (e.g. trade secrets), into a generative AI tool, the resulting output could infringe on intellectual property rights or breach confidentiality obligations.
#3. Ethical Concerns, Bias, Discrimination, and Misinformation.
Finally, organizations must carefully consider all uses of AI, including LLMs, for possible discriminatory outcomes and effects. In general, LLMs reflect the underlying data that they are trained on, which is often incomplete, biased, or outdated. For example, AI training datasets often exclude information from marginalized and minority communities, who have historically had less access to technologies such as the internet, or had fewer opportunities to have their writings, songs, or culture digitized. ChatGPT was trained on internet data, and as a result is likely to reflect and perpetuate societal biases that exist in websites, online books, and articles.
For example, a Berkeley professor asked ChatGPT to create code to determine which air travelers pose a safety risk. ChatGPT assigned a higher “risk score” to travelers who were Syrian, Iraqi, Afghan, or North Korean. A predictive algorithm used for medical decision-making was biased against black patients because it was trained on data that reflected historical bias. Even though the deployers of the algorithm excluded race as a metric when running the system, the algorithm still perpetuated bias against black patients because it took economic factors and healthcare costs into account.
Furthermore, it is clear that generative AI and LLMs can be potentially disruptive and change the way we consume and create information. ChatGPT has demonstrated its ability to write news articles, essays, and television scripts. Supplied with a prompt loaded with disinformation or misinformation, LLMs can produce convincing content that could mislead even thoughtful readers. Audiences are at risk of consuming vast amounts of misinformation if they are not able to fact-check the information given to them or know that content was generated by an LLM or AI. Organizations that use LLMs should be aware that LLMs can generate inaccurate or misleading information, even when prompts are not intended to mislead. Vigilance is required when organizations ask LLMs or generative AI to give clients, customers, or users information solely based on the LLM.
To address ethical concerns, bias, discrimination, and misinformation, organizations have a responsibility to scrutinize their use of generative AI and LLMs. Ethical considerations are incredibly important and progress is being made on transparency in generative AI models, though complete solutions remain elusive. Ethical considerations are especially important when the AI is used in an outcome-determinative way, such as in hiring or healthcare. In some cases, such uses will risk running afoul of employment or other civil rights laws. Organizations must determine the different contexts that this type of AI use can be particularly susceptible to bias and discrimination, and will want to avoid those situations. Organizations should engage and talk to the communities that are most affected in these cases and get stakeholder input when drafting internal policies.
Conclusion
Recent developments concerning LLMs and generative AI demonstrate substantial technological advancements while also presenting many uncertainties. There are many unanswered questions and yet to be discovered risks that may result from use of AI in the workplace. However, these harms can be mitigated if organizations take the time to address these issues internally and develop best practices. We encourage organizations to be inclusive and cross-collaborative when engaging in these conversations with lawyers, engineers, customers, and the public.