OpenAI and Microsoft Accused of Scraping and Exploiting Private Data for AI Training

OpenAI, the company behind the popular chatbot program ChatGPT, is facing a class action lawsuit from a group of anonymous individuals who claim that the company has stolen and misused their personal data to train its artificial intelligence large language models.

The lawsuit, filed on Wednesday in federal court in San Francisco, accuses OpenAI of secretly scraping 300 billion words from the internet, including books, articles, websites and posts that contain personal information obtained without consent. The plaintiffs allege that OpenAI has violated privacy laws, terms of service agreements, and computer fraud statutes by accessing and exploiting their private data without their permission or knowledge.

The lawsuit also names Microsoft Corp., which is a multi-billion dollar investor in OpenAI, as a co-defendant. The plaintiffs seek $3 billion in potential damages, based on an estimated class size of millions of individuals whose data has been harvested by OpenAI.

In the filing, the plaintiffs describe themselves by their occupations or interests, but use only initials to identify themselves for fear of retaliation from OpenAI or its supporters. They claim that OpenAI's products, such as ChatGPT and other generative AI applications, are trained on their private data and pose a threat to their privacy, security, and dignity.

How to Manage Copyright Infringement and AI?

The lawsuit also warns of the broader risks of OpenAI's activities, such as the potential for civilizational collapse, misinformation, and manipulation. The plaintiffs argue that OpenAI is engaged in an “AI arms race” that disregards ethical and legal standards in pursuit of profits and dominance.

OpenAI is a leading company in the field of artificial intelligence, founded in 2015 by a group of prominent tech entrepreneurs and researchers, including Elon Musk and Sam Altman. The company's stated mission is to ensure that artificial intelligence is aligned with human values and can be used for good. However, the company has also faced criticism for its lack of transparency, accountability, and oversight.

OpenAI did not immediately respond to requests for comment on the lawsuit. Microsoft also did not respond to inquiries. Microsoft has been a long-time investor in OpenAI, including a previous $1 billion investment. Earlier this year, the company reportedly invested a further $10 billion and takes a portion of OpenAI's profits.

In return, Microsoft has been able to leverage OpenAI's models to mainstream AI across its services. Bing Chat, Microsoft 365 Copilot, GitHub Copilot, and Azure OpenAI Service are examples of Microsoft products that include the GPT-4 large language model.

The lawsuit comes at a time when artificial intelligence is becoming more powerful and pervasive, raising questions about its impact on society and the need for regulation. Congress is currently debating the potential and dangers of AI, as well as the role of the government in overseeing its development and use.

Just days after the February launch of Bing Chat, I wrote a feature discussing the potential content minefield AI causes. Large language models do not generate content in the strictest sense. They can do nothing on this own. All models need data and they get that by scraping information online.

This is fine but essentially every piece of AI response you get from a chatbot such as ChatGPT or Bing Chat is a collection of data pieced together. It may have been pieced together in a new way, but it is not especially original. The situations is more concerning when it comes to creative works, where the AI essentially “borrows” from existing content and passes it off as unique or original by changing it.

OpenAI and Microsoft Accused of Scraping and Exploiting Private Data for AI Training

How to Manage Copyright Infringement and AI?

Recent News

Reddit Launches Dynamic Product Ads in Global Public Beta

Google Announces Direct Microsoft 365 App Access on ChromeOS