HomeWinBuzzer NewsReddit CEO: We’re Not Letting Big Tech Use Our Data for Free

Reddit CEO: We’re Not Letting Big Tech Use Our Data for Free

Reddit’s Steve Huffman defends the platform's data rights in the AI era, partnering with OpenAI and Google while blocking other search engines from scraping content.

-

Reddit’s CEO Steve Huffman isn’t pulling any punches in the ongoing conflict between Reddit and tech giants. During an appearance at the Wall Street Journal’s Tech Live event, he made it clear Reddit is ramping up efforts to defend its data from being exploited by artificial intelligence (AI) companies without permission.

Huffman explained that Reddit’s user-generated content has become increasingly valuable for AI training but said the platform is no longer willing to give away its data without compensation.

AI models need human knowledge, and Reddit’s content is full of it,” he remarked, underscoring the growing demand for authentic human discussions to fuel AI systems. He emphasized that Reddit, with its endless stream of real-time conversations, provides one of the richest data sets for training these models.

Exclusive Deals with Google and OpenAI

Reddit’s position has strengthened through strategic collaborations, including partnerships with OpenAI and Google. Back in May, Reddit confirmed its partnership with OpenAI, allowing it to access Reddit’s Data API for use in ChatGPT and other AI tools. This partnership gave OpenAI access to Reddit’s vast pool of user-generated content, enabling the AI company to better understand everyday conversations and create more accurate models.

Meanwhile, Google inked a $60 million deal with Reddit earlier in the year to access Reddit’s data. As of July 2024, Google gained exclusive rights to scrape Reddit content, while other search engines like Bing and DuckDuckGo were blocked following changes to Reddit’s robots.txt file. Reddit has maintained that these restrictions weren’t solely driven by the deal with Google, although they certainly benefit the search giant in the AI arms race.

Blocking Bing and Others from Reddit Data

As Huffman explained, Reddit had to draw the line when it came to data scraping. After years of being “scraped every which way,” the company updated its robots.txt file in July, effectively cutting off access to all web crawlers except Google. Platforms like Bing and DuckDuckGo now struggle to provide up-to-date Reddit links and content, severely affecting their search capabilities.

The move, however, wasn’t just about partnerships. Huffman took direct aim at Microsoft, accusing the company of using Reddit’s data to train its AI without any proper licensing in place. Microsoft has pushed back, arguing that webmasters were given control over web crawling since 2023, but Reddit chose to block Bing regardless.

The Reddit CEO said that AI companies have been treating the web like a free-for-all, taking whatever content they need for their models. Huffman highlighted this as a growing issue, with AI systems requiring massive datasets to improve and train properly. His comments reflect the tension brewing between content creators and tech companies, as platforms seek fair compensation for the data being used to power increasingly sophisticated AI systems.

OpenAI’s GPT-4o and Reddit’s Role

While Reddit’s relationship with Microsoft remains strained, the platform has had more success collaborating with OpenAI. In May, Reddit’s content became an integral part of training OpenAI’s latest GPT-4o model. OpenAI, led by CEO Sam Altman, a former Reddit board member and significant shareholder, has capitalized on this partnership, allowing the AI company to access Reddit’s treasure trove of discussions and insights.

Altman’s connection to Reddit was crucial in finalizing the deal, which was spearheaded by OpenAI COO Brad Lightcap. The partnership has been framed as a win-win for both companies, with Reddit benefiting from OpenAI’s tools while maintaining control over how its data is used. The deal also puts Reddit in a unique position to influence the direction of AI development.

With the GPT-4o model, OpenAI has made significant strides in multimodal AI, improving its capabilities in text, video, and even audio. The model has been praised for its ability to engage with users in a more natural way, and Reddit’s data has played no small part in that progress.

New AI Features for Reddit

As part of the collaboration, Reddit is also set to develop new AI-powered features for its users and moderators. Built on OpenAI’s platform, these tools are expected to improve user experience and help moderators manage the vast amount of content on Reddit more efficiently.

Despite these advancements, Reddit has been cautious about how its data is being used. Huffman made it clear that the terms of the Data API remain intact, and content accessed through Reddit’s API cannot be used for commercial purposes without explicit approval. Reddit’s goal is to ensure that its content is only used in ways that benefit the platform and its users.

Huffman has been vocal about the need for AI companies to start treating web content creators with more respect, acknowledging the value of the data they are using to train their systems. This debate will likely continue as AI technology advances, with platforms like Reddit at the center of the conversation.

Last Updated on November 7, 2024 2:24 pm CET

Luke Jones
Luke Jones
Luke has been writing about Microsoft and the wider tech industry for over 10 years. With a degree in creative and professional writing, Luke looks for the interesting spin when covering AI, Windows, Xbox, and more.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x