Meta Reuploaded 30% of Pirated Books It Downloaded for AI Training, Participating in Digital Piracy

Meta has reportedly reuploaded 30% of the pirated books it downloaded for AI training, raising new legal and ethical concerns over copyright violations.

New expert findings suggest that Meta may have played a larger role in digital book piracy than previously understood. While it has long been reported that the company used pirated books from shadow libraries to train its AI models, a recent analysis indicates that roughly 30% of the books Meta obtained via BitTorrent were subsequently reuploaded.

This raises fresh legal and ethical concerns about the company’s role in sustaining digital piracy.

BitTorrent’s peer-to-peer structure means that when a file is downloaded, it is also often uploaded in small portions for other users.

However, experts say that Meta’s upload rate was unusually high, suggesting that the company may have inadvertently contributed to the ongoing distribution of pirated books.

While AI developers have argued that training on publicly available texts falls under fair use, these findings complicate the discussion—Meta may not have just used pirated materials; it may have also helped spread them.

Meta’s AI Training and Its Connection to Piracy Networks

Meta’s AI models, including its Llama series, were trained on a vast dataset that reportedly included books from shadow libraries such as LibGen and Z-Library.

According to court filings in the U.S. District Court for the Northern District of California, internal company emails confirm that Meta executives were aware of the legal risks associated with these datasets. Despite this, they proceeded with AI training.

One internal Meta discussion revealed employee concerns about how the company was obtaining AI training data. In a message included in the lawsuit, an engineer stated: “Torrenting from a [Meta-owned] corporate laptop doesn’t feel right.” Another employee suggested it would be wiser to seek approval first, rather than deal with potential legal fallout later.

Although BitTorrent automatically uploads parts of downloaded files, experts argue that Meta’s upload volume was atypically large. This raises the possibility that its AI data-gathering process may have prolonged the availability of copyrighted books in piracy networks—even beyond its own training objectives.

Meta’s Fair Use Defense and Legal Risks

As Meta faces increasing scrutiny, the company is leaning heavily on its fair use defense. In court filings, it has argued that copying books for AI training does not constitute copyright infringement under U.S. law.

In a statement to the court, Meta’s legal team contended that the company’s AI development methods are protected because they are “transformative in nature” and do not replicate the books verbatim.

However, copyright experts argue that Meta’s reuploading of files could complicate this defense. While AI firms have previously justified using copyrighted materials under fair use, actively redistributing those materials—whether intentional or not—falls into a different legal category.

Microsoft and OpenAI rely on the fair use argument in their New York Times lawsuit, as do Suno and Udio, known for their AI-driven music technology. Anthropic just won a separate case about reproducing song lyrics with its Claude AI models using the same argument.

The Digital Millennium Copyright Act (DMCA) explicitly prohibits the unauthorized distribution of copyrighted works, meaning that if Meta is found to have reuploaded copyrighted books at scale, it could face a different set of legal consequences.

Furthermore, the European Commission has been alerted to Meta’s AI training practices, which could lead to further regulatory scrutiny under EU copyright law. If European regulators take action, Meta could face steep penalties under the EU AI Act, which enforces stricter copyright protections for AI-generated content.

Authors and Publishers Call for Accountability

The backlash against Meta’s AI training practices is growing. Authors, publishers, and legal experts worldwide have criticized the company’s approach to acquiring training data, with some pursuing legal action.

Australian authors—including former prime ministers Malcolm Turnbull and John Howard—were outraged to discover their works were included in the LibGen dataset allegedly used by Meta.

In an interview with The Guardian, Turnbull called the revelations “deeply concerning” and stated that authors should have the right to control how their work is used.

In the U.S., Pulitzer Prize-winning author Michael Chabon and comedian Sarah Silverman are among those who have taken legal action against Meta, accusing the company of using pirated versions of their books for AI training.

The Society of Authors has also condemned Meta’s methods, calling the use of pirated books “appalling.”

Additionally, a French publishers’ lawsuit alleging copyright violations claims that Meta’s AI training practices amount to “monumental looting” of copyrighted works. The plaintiffs argue that Meta’s actions set a dangerous precedent for authors, publishers, and the intellectual property rights of content creators worldwide.

What’s Next for Meta and AI Copyright Cases?

As legal and regulatory scrutiny intensifies, Meta may be forced to disclose more details about how it acquired training data. Courts are already weighing whether companies like Meta, OpenAI, and Google should be required to secure explicit licenses for copyrighted works used in AI training.

If the expert analysis holds and Meta is found to have actively redistributed copyrighted books, it could face serious legal and financial consequences.

The final legal outcome remains uncertain, but one thing is clear: Meta’s AI training practices are now at the center of one of the most significant copyright disputes in recent history.

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x