HomeWinBuzzer NewsReddit CEO Alleges Microsoft Misused Data for AI Training

Reddit CEO Alleges Microsoft Misused Data for AI Training

Following Reddit's decision to paywall its data for other tech companies, the network's CEO says Microsoft was training its AI using the data.

-

Steve Huffman, CEO of Reddit, has alleged that Microsoft used Reddit’s data for AI training without proper consent. Speaking to The Verge, Huffman accused Microsoft and other AI companies like Anthropic and Perplexity of assuming that online content is freely available for their use without explicit permission.

Microsoft’s Response and Measures

Jordi Ribas, who oversees search at Microsoft, defended the company’s practices by noting that webmaster controls for crawling were provided to Reddit and other sites as of September 2023. Despite these efforts, Reddit decided to block Bing from accessing its data, preferring a different search engine. Ribas expressed concern on X about how this decision affects Bing and its competitiveness.

Huffman underlined the growing challenge of safeguarding Reddit’s data from unauthorized use. He emphasized that the traditional model—where web crawling is compensated by directing traffic to the site—is becoming less clear. This situation prompts important questions about the usage of website data by AI systems and the necessity for consent and compensation.

Context and Recent Measures

In a move announced last week, Reddit restricted data access for search engines that do not pay a fee, targeting Microsoft’s Bing but not Google, which agreed to partner with the social network for data usage. This decision follows Microsoft’s AI CEO Mustafa Suleyman’s controversial statement claiming nearly all web content is available for AI training.

Huffman also accused Microsoft of summarizing Reddit content in Bing search results without disclosure and selling Reddit data through Bing’s API. He referenced Suleyman’s comment that public online data is essentially “freeware,” further fueling the debate.

Recent developments include OpenAI’s announcement of SearchGPT, which will incorporate Reddit results due to a deal reached earlier this year. According to Reddit’s Tim Rathschmidt, none of their licensing deals grant exclusive rights to their data.

Statements from Other Companies

Jennifer Martinez from Anthropic confirmed to The Verge that Reddit has been blocked from their web crawler since mid-May, emphasizing that they adhere to robots.txt guidelines for web crawling.

This dispute highlights the complex issues surrounding data usage and underscores the need for precise guidelines governing relationships between content providers and AI developers. Huffman described the blocking process as particularly laborious, reflecting the difficulties in enforcing data protection.

Last Updated on November 7, 2024 3:26 pm CET

SourceThe Verge
Luke Jones
Luke Jones
Luke has been writing about Microsoft and the wider tech industry for over 10 years. With a degree in creative and professional writing, Luke looks for the interesting spin when covering AI, Windows, Xbox, and more.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x