Google to Scrape Entire Public Internet to Train Its AI Tools

Google has announced that it will be scraping the entire public internet to train its AI tools. This means that the company will be collecting data from every website, forum, and social media platform that is accessible to the public.

Google has recently made a significant change to its privacy policy that allows it to use public data to train its artificial intelligence (AI) models. The policy update affects data that users have shared publicly on Google services or third-party platforms, such as photos, videos, reviews, comments and posts. An amendment to the search giant's privacy policy contains an important update relating to AI:

“For example, we use publicly available information to help train Google's AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”

According to Google, the use of public data for AI training will help it improve its products and services that rely on AI, such as Google Photos, YouTube, Maps and Assistant. Google claims that it will only use public data for purposes that are consistent with its privacy policy and terms of service, and that it will respect users' choices and settings regarding their data.

However, Google also clarifies that it will not use sensitive or personal information, such as biometric data, health data or financial data, to train its AI models without users' explicit consent. Google says that it will protect users' privacy and security by applying rigorous safeguards and best practices to its AI development and deployment.

Using Public Data to Train AI

The policy change comes amid growing concerns and regulations around the ethical and social implications of AI, especially in areas such as facial recognition, content moderation and personalization. Google has faced criticism and lawsuits for some of its AI practices, such as using users' photos without permission to create facial recognition databases and algorithms.

Google says that it is committed to developing and using AI responsibly and transparently, and that it will continue to engage with stakeholders and experts to ensure that its AI products and services are beneficial for society. Google also encourages users to review its privacy policy and terms of service regularly to stay informed of how it collects and uses their data.

If you are concerned about the privacy implications of Google's plan, there are a few things you can do. First, you can read the company's privacy policy and understand how your data will be collected and used. Second, you can use privacy-focused browsers and extensions, such as DuckDuckGo and Privacy Badger. Finally, you can be mindful of the information you share online, and only share information that you are comfortable with Google having.

Of course, almost all of Google's revenue comes from advertising. It is likely the company will use the public information it scrapes to influence its advertising output. In fact, as I reported in May, the company has reportedly already decided to use generative artificial intelligence (AI) for its advertising and customer service operations, according to a leaked internal document obtained by CNBC. The document outlines Google's plans to leverage its large language models, such as LaMDA , to create personalized and engaging ads and chatbots for its users.

Google to Scrape Entire Public Internet to Train Its AI Tools

Using Public Data to Train AI

Recent News

Reddit Launches Dynamic Product Ads in Global Public Beta

Google Announces Direct Microsoft 365 App Access on ChromeOS