HomeWinBuzzer NewsAmazon Investigates Perplexity AI for Web Scraping Violations

Amazon Investigates Perplexity AI for Web Scraping Violations

Amazon aims to assess allegations of whether Perplexity AI has breached the Robots Exclusion Protocol.

-

Amazon Web Services (AWS) is investigating allegations that Perplexity AI, the search engine backed by Amazon founder Jeff Bezos and Nvidia, has been conducting unauthorized web scraping using AWS infrastructure. As reported by Wired, the inquiry aims to assess allegations of whether Perplexity AI has breached the Robots Exclusion Protocol by extracting data from websites that explicitly restrict such activities.

The Role of Robots.txt

The Robots Exclusion Protocol, or robots.txt, allows website administrators to control and restrict the behavior of automated bots and crawlers on their sites. This is done by placing a plaintext file in the site’s root directory, indicating which areas should not be accessed by these automated entities.

Though following this protocol is not legally enforced, it is widely respected within the industry. AWS mandates that its users adhere to this protocol, as stated in its terms of service, which also prohibits unlawful activities.

Scraping Allegations Reported

The AWS investigation commenced following reports that claim Perplexity AI replicates existing articles from major news outlets. Forbes recently called out Perplexity AI for allegedly replicating its content without due credit. The dispute revolves around an article on Eric Schmidt’s drone company that Forbes claims was copied by Perplexity in an AI-generated podcast. 

Further investigation by WIRED corroborated these claims, providing evidence of scraping activities associated with Perplexity AI’s search tool. An AWS spokesperson, choosing to remain unnamed, confirmed the inquiry is ongoing, noting the company’s policy that customers must comply with AWS terms and relevant laws.

Perplexity AI’s Position

Perplexity AI CEO Aravind Srinivas defended the company by clarifying that there was a misunderstanding regarding Perplexity’s operations. According to Srinivas, the IP address found to be scraping Condé Nast content was actually associated with a third-party company specializing in web crawling and indexing.

He cited a nondisclosure agreement as the reason for not revealing the company’s name and mentioned that instructing this entity to stop crawling WIRED’s content posed challenges.

Sara Platnick, spokesperson for Perplexity AI, indicated the company had addressed Amazon’s queries, framing the investigation as routine. Platnick maintained that PerplexityBot, operating on AWS, adheres to the rules outlined in the robots.txt protocol and is compliant with AWS terms of service. Nevertheless, she acknowledged that PerplexityBot might occasionally bypass robots.txt restrictions when processing direct URLs entered by users, though such cases are reportedly rare.

Last Updated on November 7, 2024 3:45 pm CET

Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x