HomeWinBuzzer NewsU.S. House Bill Aims to Enforce Transparency in AI Training Datasets

U.S. House Bill Aims to Enforce Transparency in AI Training Datasets

The Generative AI Copyright Disclosure Act is a proposed bill that aims to bring more regulation to AI training.


In a potentially important move towards regulating artificial intelligence (AI) development, Congressman Adam Schiff (D-CA) has introduced the Generative AI Copyright Disclosure Act. The proposed legislation mandates that entities involved in the creation of AI models disclose any copyrighted material used in their training datasets. This requirement is not only applicable to future AI developments but also extends retroactively, affecting existing AI systems.

Comprehensive Disclosure Requirements

The bill specifically targets the construction of generative AI systems, which include large language models and other machine learning frameworks. It stipulates that creators of training datasets must notify the Register of Copyrights by submitting a detailed summary of copyrighted works incorporated into their datasets. Furthermore, any modifications to the dataset necessitate an updated submission. Importantly, the legislation requires that a URL for the training dataset be made publicly accessible through a designated database. Compliance with these requirements must be achieved within 30 days following the public release of an AI system trained on such datasets. AI systems developed prior to the enactment of the bill are also subject to this 30-day compliance window.

Penalties and Endorsements

Noncompliance with the act carries a minimum penalty of $5,000. Congressman Schiff emphasizes that the bill seeks to balance the transformative potential of AI with the imperative of ethical guidelines and protections. The bill has garnered support from several creative trade groups, including the Recording Industry Association of America and the Writers Guild of America, highlighting its significance in protecting creators' rights in the digital age.

Challenges and Support for the Creative Community

The introduction of this legislation comes amid growing concerns over the use of copyrighted content to train systems without authorization. Industry experts and creators have criticized this practice for potentially undermining the value of creative works and compensating creators fairly. The bill aims to introduce greater transparency and establish safeguards around AI to protect the interests of writers, artists, and other content creators.

AI Industry's Stance

While the reaction from AI companies to the proposed bill remains to be fully seen, previous statements from entities like suggest that training effective AI models without relying on copyrighted content is currently challenging. The bill does not prohibit the use of copyrighted materials for but instead focuses on ensuring such use is transparent and on public record.

Luke Jones
Luke Jones
Luke has been writing about all things tech for more than five years. He is following Microsoft closely to bring you the latest news about Windows, Office, Azure, Skype, HoloLens and all the rest of their products.

Recent News