In a critical misstep, OpenAI engineers inadvertently erased key evidence in an ongoing copyright lawsuit with The New York Times and Daily News.
The incident, which occurred on November 14, 2024, disrupted the plaintiffs’ efforts to investigate claims that OpenAI improperly used their articles to train its AI models.
OpenAI had previously provided access to virtual machines for the plaintiffs’ legal teams to search for copyrighted material in the company’s datasets. The data deletion has set back the investigation by weeks, with significant technical and financial costs.
In a filing with the U.S. District Court for the Southern District of New York, the plaintiffs emphasized the time and resources required to redo their work. While they stopped short of accusing OpenAI of intentional sabotage, they highlighted that the company is best positioned to search its datasets efficiently.
“On November 14, all of News Plaintiffs’ programs and search result data stored on one of the dedicated virtual machines was erased by OpenAI engineers. While OpenAI was able to recover much of the data that it erased, the folder structure and file names of the News Plaintiffs’ work product have been irretrievably lost. Unfortunately, without the folder structure and original final names, the recovered data is unreliable and cannot be used to determine where the News Plaintiffs’ copied articles were used to build Defendants’ models.
Therefore, News Plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time. The News Plaintiffs learned only yesterday that the recovered data is unusable and that an entire week’s worth of its experts’ and lawyers’ work must be re-done, which is why this supplemental letter is being filed today.”
$7.6 Million Already Spent on Legal Efforts
The lawsuit, filed in December 2023 in the Federal District Court of Manhattan, accuses OpenAI of violating copyright law by using content from the Times without permission.
Microsoft is also named in the suit for integrating OpenAI’s AI tools, such as ChatGPT, into products like Bing Chat, now branded as Copilot. The Times claims these tools summarize or paraphrase articles without attribution, bypassing its paywalls and diverting traffic from its affiliate-based platform, Wirecutter.
The financial toll of the lawsuit on the Times is steep. Reports indicate the media outlet has invested $7.6 million in its legal campaign this year alone, including $4.6 million in the last quarter. These costs reflect the broader stakes for the Times, which is demanding billions in damages and the destruction of AI models trained on its content.
The plaintiffs argue that these AI tools harm their subscription and affiliate revenue models, with Wirecutter particularly impacted. Microsoft and OpenAI deny wrongdoing, maintaining that their AI systems operate within the bounds of fair use, a legal principle that permits limited use of copyrighted materials for purposes such as commentary or research.
Licensing Deals and Industry Divide
Unlike the New York Times, some publishers have opted to collaborate with AI companies. OpenAI has struck licensing agreements with major outlets such as TIME and Dotdash Meredith. These deals, reportedly worth millions annually, allow OpenAI to legally use their archives for training its models. For example, Dotdash Meredith is said to receive $16 million per year under its contract with OpenAI.
The divide between litigation and collaboration underscores the media industry’s struggle to navigate the rise of generative AI. While partnerships offer a way to monetize content in an AI-driven world, litigation seeks to establish clearer legal boundaries for AI training practices.
Related: |
Microsoft’s Role and Broader Implications
Microsoft’s integration of OpenAI’s GPT models into products like Bing Chat and Copilot has drawn additional scrutiny. The Times claims these tools summarize its articles, excluding crucial links to its platforms, further eroding revenue. Microsoft defends its practices, arguing that AI-generated summaries transform content in ways consistent with fair use.
The legal battle reflects larger tensions between AI developers and content creators. Other lawsuits, including actions by the Authors Guild and high-profile writers such as George R.R. Martin and Sarah Silverman, accuse OpenAI and other companies of using copyrighted works without permission.
Internal Challenges Compound OpenAI’s Troubles
OpenAI has faced internal turmoil in recent months. In May 2024, co-founder and chief scientist Ilya Sutskever stepped down following disagreements over transparency and strategy.
Other prominent figures, like Jan Leike, have also left the company, joining rival firms focused on ethical AI development. These leadership changes coincide with mounting legal challenges, adding pressure to OpenAI’s operational focus.