French Publishers and Authors Sue Meta for Copyright Breach over AI Model Training

French publishers and authors have filed a lawsuit against Meta, alleging the unauthorized use of copyrighted works to train its AI models.

Meta Platforms Inc. is confronting a lawsuit filed by French publishers and authors, who allege the company utilized their copyrighted works without authorization to train its artificial intelligence (AI) models.

The complaint, lodged in a Paris court specializing in intellectual property, was initiated by prominent associations including the Syndicat National de l’Édition (SNE), the Syndicat National des Auteurs et des Compositeurs (SNAC), and the Société des Gens de Lettres (SGDL).

According to Le Monde, these groups accuse Meta of massive copyright violations and economic parasitism, asserting that the company’s actions amount to a “monumental looting” of their intellectual property.

SNE President Vincent Montagne stated that they had gathered evidence of extensive copyright breaches and had previously reached out to Meta without success.

He also indicated that the European Commission had been informed, asserting that Meta’s practices violate European Union regulations on artificial intelligence.

Internal Documents Reveal Use of Pirated Materials

Internal documents recently disclosed in a U.S. court case suggest that Meta’s CEO, Mark Zuckerberg, approved the use of pirated books to train the company’s AI models.

The documents indicate that Meta’s AI team was authorized to utilize datasets from Library Genesis (LibGen), a platform known for providing unauthorized access to copyrighted works.

Employees reportedly expressed concerns about the legality of using such datasets, warning that it might undermine the company’s standing with regulators.

These objections were escalated to Zuckerberg, who ultimately approved the dataset’s use. An internal memo confirmed, “After escalation to MZ [Mark Zuckerberg], Meta’s AI team was approved to use LibGen.”

Additionally, internal communications revealed that Meta employees downloaded approximately 82 terabytes of data from shadow libraries, including LibGen, Z-Library, and Anna’s Archive, to train their AI systems. Despite internal ethical concerns, the decision to proceed with using these datasets was approved at the highest levels of the company.

Global Implications and Ongoing Legal Battles

The revelation of the use of books and online material for training AI models have sparked a series of lawsuits worldwide, with content creators arguing that using their work to train these models constitutes copyright infringement.

AI companies have generally been reluctant to disclose their data sources, maintaining that the practice falls under “fair use” as defined by U.S. copyright law.

In the United States, Meta is facing a lawsuit filed by authors, including Ta-Nehisi Coates and Sarah Silverman, who allege that the company used pirated versions of their books to train its AI models.

Meta’s Ethical and Legal Approach Under Scrutiny

One of the core concerns raised in the lawsuits involves Meta’s decision to strip copyright management information (CMI) from training datasets. Under the Digital Millennium Copyright Act (DMCA), this practice is prohibited if done to conceal infringement.

Court documents also indicate that Meta’s AI team deliberately removed CMI to obscure the origins of copyrighted content, aiming to prevent the Llama models from outputting identifiable copyrighted data.

This approach has been described as a calculated effort to reduce public awareness of potential copyright violations.

Internal objections to the dataset’s use were raised, with one engineer expressing discomfort about sourcing materials via torrents, noting, “Torrenting from a [Meta-owned] corporate laptop doesn’t feel right.”

Nevertheless, Zuckerberg’s approval paved the way for proceeding with these datasets, despite known ethical risks. This decision underscores Meta’s aggressive stance in the AI race, emphasizing rapid development even at the expense of potential legal repercussions. 

Industry-Wide Debate on Fair Use and AI Training

Meta’s defense, likely to mirror arguments made by Microsoft, centers on the assertion that using publicly accessible materials to train AI models falls within the boundaries of fair use.

In March 2024, Microsoft defended its practices after being accused by The New York Times of improperly using its articles for AI training. Microsoft argued that such usage did not harm the market for the original works and aligned with fair use provisions under U.S. law.

These legal defenses highlight a broader industry debate about how copyright applies to AI development. While tech companies argue that AI-generated content is transformative and does not replicate original works, creators and publishers argue that their rights are being infringed without compensation.

European Regulations and Future Implications

The European Union’s AI Act has introduced stringent regulations, mandating that AI systems comply with copyright rules, including respecting the rights of content creators. If Meta is found in violation of these regulations, the company could face substantial penalties.

The lawsuit in France represents one of the first major tests of the EU’s regulatory framework for AI.

Industry analysts suggest that the outcomes of these lawsuits could reshape how AI models are trained globally. Companies may be forced to rely on licensed datasets or develop new strategies to comply with copyright regulations.

Meta’s Position and the Road Ahead

Meta has yet to publicly comment on the French lawsuit. However, its historical defense suggests that the company will argue its AI practices adhere to fair use standards.

This defense will likely face rigorous legal examination, especially given the explicit allegations of pirated content usage. Whether the courts side with Meta or with the creators could shape the broader legal boundaries governing AI development.

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x