In Short
The Situation: Workforces are increasingly using generative artificial intelligence ("AI") platforms to generate diverse content ranging from marketing materials, translations, source code, and more.
The Result: Various copyright issues arise from the use of generative AI platforms, including ownership of the output, infringement considerations (both on the use of the output and the training of the underlying models), and the copyrightability of content created using generative AI.
Looking Ahead: Generative AI is expected to further proliferate and increasingly be a tool used by employees in workforces across diverse industries. Companies must be mindful when adopting these platforms or allowing employees to use these tools on behalf of the company—especially with regard to how the outputs are being used or incorporated into other company information.
Over the past few years, AI has dominated headlines, increasing awareness and intrigue about its promises and perils. Most recently, the buzz around generative AI "chatbots" has reached a fever pitch, as these AI systems seemingly engage with users on a human-like conversational level and generate complex content from even the simplest user inputs. Following up on a recent Alert on the Copyright Office's Artificial Intelligence Initiative, this Commentary provides a closer look at the intersection of U.S. copyright law and generative AI.
Ownership of Generated Content: Who owns the output of a generative AI model—if the output can be owned at all—might be set out by the terms of use for the AI tool (which may be available on the website associated with the tool), or by an implied license if there are no terms.
Infringement: The output of a generative AI model could implicate several rights protected by copyright, but ultimate liability is uncertain. Training a generative AI model using copyrighted works could result in allegations of a copyright infringement, depending on whether and to what extent original material was in fact copied. These scenarios will likely be highly fact-intensive and will present novel challenges to the fair use doctrine—for example, a user may argue that any such use of copyrighted material was de minimis and resulted in a transformative work.
Protectability: The Copyright Office's recent guidance confirms its position that AI-generated material is unprotectable where a human solely provided a prompt that resulted in generated content. On the other hand, the Office recognizes that a human's selection, arrangement, or modification of AI-generated material may result in sufficiently original human expression that warrants copyright protection. The devil, of course, will be in the details—details that will vary on the facts in specific cases that wind up in litigation.
Overview of Generative Artificial Intelligence
Generative AI refers to artificial intelligence algorithms (such as large language models) that can create new content based on data that the AI has been trained on. At a high level, much of AI research centers on models, algorithms that take certain inputs (for example, a text prompt) and, based on the input and internal parameters, generate certain outputs (for example, an image embodying the text prompt or a text response that appears to be human-like in its response to the input inquiry, etc.). The learning process involves giving the algorithm many sample input-output pairs (a picture of a dog growling and the text "dog growling") so that the internal parameters can be adjusted based on relationships between input and output that the algorithm infers. When the algorithm is given a new input from an end user, it can then use its already-tuned internal parameters to generate an output that reflects the training data.
How did generative AI suddenly become so much better than what we had before? Part of the reason is that vastly greater amounts of data are being used to train AI models. For example, it has been reported that certain computer vision-based generative AI platforms were trained on more than two billion captioned images. AI researchers have also created new methods and techniques for more rapidly building AI systems, often using AI to train AI. Technical innovations that have been used to help generative AI reach its current levels include generative adversarial networks, where two neural networks, one generating content and one trying to determine if content is real or fake, are pitted against each other. Another technique is reinforcement learning from human feedback, where an already-trained AI model generates several outputs, humans rank those outputs, a reward model is trained on these rank outputs to estimate how much a human would like an output, and the reward model is used to refine the already-trained AI model.
Who Owns the Output?
Imagine a scenario where a company employee uses generative AI to generate content that accomplishes a work assignment. For example, perhaps a programmer asks a large language model for code that performs a critical function, then integrates the result into the company's codebase. This raises an immediate concern: could the company now lack ownership rights in portions of its software? Will the large language model company or some other third party be able to assert a copyright interest in the company's codebase? Alternatively, could there be an argument that no one has a copyright interest in the AI-generated material—i.e., that it is in the public domain?
There are various possible answers to the fundamental question of "who owns" the AI-generated material. Depending on the specific facts, one might argue that ownership vests in the human user who provided the prompt to the generative AI tool (provided that the human did something more than simply provide the prompt and can identify sufficiently original, human-authored expression that warrants copyright protection), the generative AI company that provided the model (assuming that the generative AI company can identify underlying original expression authored by a human and can meet the legal requirements to establish that it owns that expression, not the human actor), or nobody (on the theory that the content is not copyrightable because no human actor authored any original expression that would warrant copyright protection). Moreover, a third party could argue that the AI-generated material infringes that party's copyrighted material—for example, that the AI-generated output uses or is a derivative of that third party's original work. See 17 U.S.C. § 106 (granting a copyright owner the "exclusive right" to copy the copyrighted work and create derivative works based on it).
What Rights Might the Output Infringe?
Independent of ownership of the output of AI system (as between the platform operator and the user), there still may be infringement risks in the use of such output. For example, generated software code could infringe the rights of others, as will be discussed below.
At the outset, it is worth noting that copyright, broad though its scope may be, is ultimately a set of specific things one cannot do to a protected work without authorization or other legal justification. These rights are:
- To reproduce the copyrighted work in copies or phonorecords;
- To prepare derivative works based upon the copyrighted work;
- To distribute copies or phonorecords of the copyrighted work to the public by sale or other transfer of ownership, or by rental, lease, or lending;
- In the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works, to perform the copyrighted work publicly;
- In the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work, to display the copyrighted work publicly; and
- In the case of sound recordings, to perform the copyrighted work publicly by means of a digital audio transmission.
Of these rights, software—which is considered a literary work—often implicates the reproduction and derivative work rights. The specific contents of the code is an important consideration—the more that the generated code comes directly from a pre-existing copyrighted work, the easier it will be for that work's copyright holder to prove substantial similarity and thus, liability for infringement. But even if the generated code is meaningfully different from the pre-existing work, an aggressive copyright holder who can prove its work was used to train the large language model that generated the code could argue that any model output is "based upon" its preexisting work and is therefore a derivative work per 17 U.S.C. § 101. Indeed, a similar argument has already been made in federal court. See Complaint ¶ 4, Andersen v. Stability AI Ltd., ECF No. 1, No. 3:23-cv-00201 (N.D. Cal. Jan. 13, 2023). In its defense, the company might raise the doctrine of scène à faire if the code is simple and standard, the idea/expression dichotomy to argue that only unprotectable ideas were copied, and fair use. While there is ample case law related to these common copyright infringement defenses, they are largely untested in the context of AI.
Can the Output be Protected?
Despite the potential legal issues that could arise, the company weighs the pros and cons of using generative AI tools and decides to allow its programmers to use AI-generated code and incorporate it into company software. But then somebody raises a question: "What if a competitor decides to just copy the code? Is there anything we can do to stop that?"
The answer to this question is likely to depend largely on the extent of a human author's selection, arrangement and/or modification of the code . The United States Copyright Office recently issued a statement of policy on registration of works containing AI-generated material. 88 Fed. Reg. 16,190 (Mar. 16, 2023). The Office's view is that "[i]f a work's traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it." Id. at 16,192. When a human gives an AI "solely a prompt" and the AI generates "complex written, visual, or musical works in response," the situation is akin to the human giving "instructions to a commissioned artist." Id. The AI is the one that "determines the expressive elements of its output," so the resulting work "is not protected by copyright and must be disclaimed in a registration application." Id.
Despite this, the company need not lose hope. The question of whether an AI-generated work without human authorship can be registered is currently being litigated. See Thaler v. Perlmutter, No. 1:22-cv-01564 (D.D.C). Others are trying to work around the Office's guidance by using their own images as input to generative AI as an "assisting instrument." See Tiffany Hu, Artist Seeks Copyright Of AI Artwork That Uses Own Drawing, Law360 (Mar. 23, 2023, 10:19 p.m.). Even without a registration certificate, the company may still sue alleged infringers once the Office's denial of registration is in hand. 17 U.S.C. § 411(a). However, in such circumstance, the Register of Copyrights will be entitled to intervene on the issue of registrability. Id. If copyright fails, the company could also try other approaches, such as requiring an agreement not to copy in order to view any AI-generated material, but it will need to be wary of copyright preemption.
What About Training Generative AI on Copyrighted Material?
The company has decided that instead of relying on an outside service, it will train its own generative AI model for its programmers to use. This scenario can raise various issues including: i) did the company comply with copyright in obtaining the files to be trained on; and ii) does the act of training a generative AI model on copyrighted subject matter raise infringement risk? First, maintaining a full and exact copy of a copyrighted work exposes the company to risk of infringement of the reproduction right. The company might nevertheless argue fair use, relying on transformativeness and arguing that the actual training files are (ideally) not being distributed to end users and are being used for the further purpose of discovering inferred relationships between data, not for the creative expression itself. Second, the trained model's internal parameters will presumably be stored in a file on a computer, like the .ckpt checkpoint files Stable Diffusion uses. The file could be a fixation of an original work of authorship. Training a model therefore might implicate the reproduction and derivative work rights. The prima facie infringement case for models might face additional obstacles compared to the infringement case against outputs. A list of parameters is less likely to be substantially similar to the copyrighted training works, and the model arguably only copies unprotected ideas in the form of inferred relationships, not protected expression. The fair use case will also be stronger because the model is of a different character and is less likely to directly supplant the copyrighted training works.
Three Key Takeaways
- Individuals and organizations seeking to create content using a generative AI tool must understand the applicable terms and conditions of use for that tool, which may be available on the website associated with the tool. In particular, the terms may have implications for organizations whose employees make use of such tools on behalf of their employer and/or in the course of their employment.
- Users of generative AI tools do not necessarily own the output and could potentially face allegations that the output infringes on a third party copyright. Training a generative AI model using copyrighted materials likewise raises potential copyright infringement concerns.
- Users of generative AI tools should be aware of the Copyright Office's recently issued guidance on works containing material generated by AI. In particular, a user will not be able to copyright the output solely based on the fact that they supplied the prompt. In order for the output to be eligible for copyright protection, there must be human-authored aspects of the work—for example, by sufficiently modifying the output in a manner that reflects original human authorship.