OpenAI Hits Back at Accusations of Unauthorized Data Use By The New York Times

OpenAI has issued a detailed rebuttal to allegations made by The New York Times regarding unauthorized use of its articles for training the company's advanced language models such as GPT-4. In what has escalated to a formal lawsuit, The New York Times asserts that OpenAI deployed its content without proper permissions. The AI research firm counters these claims, emphasizing that neither the lawsuit nor the alleged incidents were communicated to them prior to learning about them through the newspaper's own publication.

Regurgitation Claims and Prompt Manipulation

In OpenAI's retort, they tackle the specific accusations of “regurgitation,” a process where AI models are capable of repeating training data verbatim under specific prompting conditions. The New York Times has attributed such capabilities to its articles, suggesting an intent by OpenAI to capitalize on its content. However, OpenAI suggests that The New York Times might have manipulated prompts to produce these regurgitations, citing their rarity and the selective nature of the examples put forward by the publication. Moreover, OpenAI asserts that the content in question is outdated and has been widely disseminated across various platforms on the internet.

Fair Use and Ethical Standards

As the argument unfolds, OpenAI leans heavily on the principle of fair use, asserting that the employment of internet materials for AI training adheres to established precedents that benefit creators and are vital for innovation. The company also emphasizes their commitment to ethical practices beyond mere legal obligations, pointing to their industry-leading initiative that allows publishers, including The New York Times, to opt out from having their content used for AI model training—a provision enacted by the newspaper in August 2023.

Furthermore, OpenAI highlights how fair use regulations support their training methods and references existing licensing agreements with several news agencies which buttress their position.

To conclude, OpenAI notes that while it is at the center of The New York Times' suit, there are other litigations underway, including from authors claiming unauthorized use of their published work to inform AI models. OpenAI's engagement with these lawsuits signals an ongoing industry debate around data use, copyright law, and the responsibilities of AI developers in society.

OpenAI Hits Back at Accusations of Unauthorized Data Use By The New York Times

Regurgitation Claims and Prompt Manipulation

Fair Use and Ethical Standards

Recent News

Reddit Launches Dynamic Product Ads in Global Public Beta

Google Announces Direct Microsoft 365 App Access on ChromeOS