Google has announced that it will be scraping the entire public internet to train its AI tools. This means that the company will be collecting data from every website, forum, and social media platform that is accessible to the public.
“For example, we use publicly available information to help train Google's AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”
However, Google also clarifies that it will not use sensitive or personal information, such as biometric data, health data or financial data, to train its AI models without users' explicit consent. Google says that it will protect users' privacy and security by applying rigorous safeguards and best practices to its AI development and deployment.
Using Public Data to Train AI
The policy change comes amid growing concerns and regulations around the ethical and social implications of AI, especially in areas such as facial recognition, content moderation and personalization. Google has faced criticism and lawsuits for some of its AI practices, such as using users' photos without permission to create facial recognition databases and algorithms.
Of course, almost all of Google's revenue comes from advertising. It is likely the company will use the public information it scrapes to influence its advertising output. In fact, as I reported in May, the company has reportedly already decided to use generative artificial intelligence (AI) for its advertising and customer service operations, according to a leaked internal document obtained by CNBC. The document outlines Google's plans to leverage its large language models, such as LaMDA , to create personalized and engaging ads and chatbots for its users.