Comedian Sarah Silverman and two authors have filed copyright infringement lawsuits against Meta Platforms and OpenAI, alleging that the companies used their content without permission to train artificial intelligence language models.
The lawsuits, filed in federal court in San Francisco on Friday, claim that Meta and OpenAI used Silverman's comedy routines to train their large language models. One example is OpenAI's GPT model, which underpins the popular ChatGPT chatbot. Meta's model is known as LLaMA and will become a major part of the company's services.
The plaintiffs allege that the companies used their content without their knowledge or consent, and that they did not properly attribute the source of the content. They are seeking damages and an injunction preventing the companies from using their content in the future.
The authors allege that the data used to train the chatbots came from illegal sources, such as “shadow library” websites that offer pirated books for download. The authors name Bibliotik, Library Genesis, Z-Library, and others as examples of such websites. The authors say that their books were available on these websites and were downloaded in bulk by the companies or their affiliates.
In the lawsuits, the plaintiffs provide evidence that the chatbots can summarize the authors' books when prompted. For instance, ChatGPT can summarize Silverman's Bedwetter, Christopher Golden's Ararat, and Richard Kadrey's Sandman Slim. The lawsuits also show that the chatbots do not include any information about the authors or their copyrights when summarizing their books.
The AI Content Problem Tech Companies are Ignoring
The authors are suing OpenAI and Meta for six counts of different types of copyright violations, negligence, unjust enrichment, and unfair competition. In the filings, details show the authors are seeking damages, restitution of profits, and more.
All plaintiffs are represented by Joseph Saveri and Matthew Butterick. The lawyers say on their website that they have heard from “writers, authors, and publishers who are concerned about [ChatGPT's] uncanny ability to generate text similar to that found in copyrighted textual materials, including thousands of books.”
Both lawyers have taken an active stance against AI copyright infringement. They believe that AI models that scrape vast amounts of data online are essentially taking other people's work, combining it, and passing it off as original. This happens with the original content creators receiving any citation or credit.
Butterick has already led a similar lawsuit against GitHub Copilot, Microsoft's tool that helps people code by filling in gaps. Those gaps come from other people's code that Copilot has scraped. And this is the problem with generative AI and large language models. For example, Microsoft's Bing Chat has a “creative” mode that suggests the chatbot search AI can generate unique content.
However, that is not really happening as every single thing the AI does comes from data it has consumed. It may look original, but in reality, it is a strange composite of other people's work and content that Bing has trained on to be able to generate content.