A group of leading news publishers in the US are preparing to take legal action against artificial intelligence (AI) firms that use their content without authorization or payment, according to a report by Forbes.
The publishers allege that the AI firms are infringing on their intellectual property rights and undermining their business model by scraping, summarizing, or rewriting their articles and distributing them on various platforms, such as websites, apps, or social media.
Some of the AI firms that are likely to be targeted by the lawsuit include OpenAI, which offers a natural language generation model. The lawsuit could have significant implications for the future of journalism and innovation, as both the publishers and the AI firms claim to serve the public interest and promote access to information.
However, the publishers argue that the AI firms are exploiting their content without contributing to the production costs or sharing the revenue, and that they are harming the quality and credibility of journalism by creating distorted or inaccurate versions of their articles.
On the other hand, the AI firms contend that they are providing a valuable service to the users by making the news more accessible, personalized, and engaging, and that they are respecting the fair use doctrine, which allows limited use of copyrighted material for purposes such as education, criticism, or commentary.
The lawsuit could also raise some ethical dilemmas for the AI firms, such as how to ensure that their algorithms do not generate harmful or misleading content, how to protect the privacy and security of their users and sources, and how to balance their social responsibility with their commercial interests.
Content Creators Remain Wary of the AI Threat
Publishers considering legal action comes after Meta Platforms and OpenAI were hit with lawsuits earlier this month. The lawsuits, which were filed on Friday in a San Francisco federal court, accuse Meta and OpenAI of using the comedy routines of Sarah Silverman and the books of Christopher Golden and Richard Kadrey to train their massive language models. These models power the chatbots ChatGPT and LLaMA, which can generate natural language responses to user queries.
The plaintiffs claim that they were not aware of or consented to the use of their content by the companies, and that they did not receive any credit or compensation for their contribution. They are demanding monetary damages and an order to stop the companies from using their content in the future.
The authors also claim that the companies obtained their books from illegal sources, such as websites that offer free downloads of pirated books. They name Bibliotik, Library Genesis, Z-Library, and others as examples of such websites. They say that their books were available on these websites and were downloaded in large quantities by the companies or their partners.
The lawsuits provide proof that the chatbots can summarize the plaintiffs' books when asked. For example, ChatGPT can summarize Silverman's Bedwetter, Golden's Ararat, and Kadrey's Sandman Slim. However, the lawsuits also show that the chatbots do not mention the names or the copyrights of the authors when summarizing their books.
An Unclear Goal as A Technology Expands
Originality.AI, a service that checks content for AI traces, recently reported that Google is blocking some sites from its AdSense service, which allows publishers to earn money from displaying ads. Google claims that these sites are using “automatically generated content”, a vague term that could mean anything from AI to plagiarism.
However, detecting AI content is not an easy task, as the technology is becoming more sophisticated and human-like. There is no foolproof method to distinguish between AI and human-written content, and Google may be wrongly accusing some legitimate publishers of using AI.
Meanwhile, Google is also working on its own AI content creation tool, called Genesis. Genesis is a generative AI tool, which means it can produce new content from existing data. It uses natural language processing and machine learning techniques to analyze and synthesize information from various sources, such as websites, social media, and databases. It can then generate news articles that are relevant, accurate, and engaging.
Is Google being fair and ethical when it blocks some AI content while creating its own? I discussed this issue and the risks of a content exodus when Bing Chat launched in February. Online content is undergoing a major transformation and sorting the good from the bad is the big challenge we all face.
Navigating a Lega Minefield and Ongoing Regulatory Scrutiny
The Federal Trade Commission (FTC) is reportedly investigating OpenAI over whether the company's flagship ChatGPT conversational AI made “false, misleading, disparaging or harmful” statements about people.
According to The Washington Post a 20-page letter that the FTC sent to OpenAI requests information on complaints about ChatGPT's disparagement and reputational harm. The letter indicates that the FTC is still in the preliminary stage of its investigation, and has not made any findings or conclusions.
The FTC did not disclose what triggered its investigation into ChatGPT. The regulator usually does not initiate investigations without receiving a complaint. However, the regulator is interested in finding out whether OpenAI is neglecting to properly monitor and moderate ChatGPT's output, and whether it is breaching its own ethical principles and social responsibility standards.
Who owns the content AI services deliver is a major debate point. It is easy to think that a chatbot such as Bing Chat or ChatGPT generates its content for nothing. Tech companies are vague enough about AI capabilities that casual observers would think this. However, what these services are really doing is scraping content from online and then repackaging it has something that looks new.
A similar thing is happening with AI coding tools. These services allow users to get AI to plug in parts of code to help them complete their project. Services like GitHub Copilot are essentially taking other people's code – but only snippets – and using it to fill the code for other users. This approach has already received backlash with a lawsuit against GitHub Copilot.