HomeWinBuzzer NewsGoogle Enhances AI Accuracy with DataGemma Models

Google Enhances AI Accuracy with DataGemma Models

Google introduces DataGemma to combat AI hallucinations. The suite of AI models improve accuracy by leveraging Data Commons.

-

In an effort to address inaccuracies known as hallucinations in AI, has unveiled DataGemma, a suite of AI models geared towards refining the accuracy of statistical data analysis within large language models (LLMs). Released via Hugging Face, these tools are particularly aimed at advancing academic research. Leveraging data from their Data Commons platform, these models strive to furnish more reliable statistics in response to user queries.

Addressing AI's Challenges with Hallucinations

Erroneous outputs in AI, especially with numerical data, present ongoing challenges. Google's studies highlight that the probabilistic nature of LLMs, compounded by insufficient factual datasets during training, exacerbate these inaccuracies. Traditional LLMs often falter when dealing with complex tasks involving logical and mathematical computations. To rectify this, Google has pioneered two novel techniques: Retrieval Interleaved Generation (RIG) and Retrieval Augmented Generation (RAG).

RIG employs a method of cross-verifying model-generated answers with statistics sourced from the Data Commons, turning plain language into structured data inquiries. The tool allows for either correcting or bolstering the responses with factual data. RAG, meanwhile, extracts pertinent elements from the initial question to form a precise query, helping the model access relevant data that informs a long-context LLM for more accurate outputs.

Assessing the Effectiveness of DataGemma

During testing with 101 different queries, DataGemma models fine-tuned via RIG showed an improvement in factual correctness from an initial range of 5-17% to approximately 58%. Although RAG's enhancements were less dramatic, it still delivered better accuracy than standard models. Around 24-29% of queries were answered using Data Commons, boasting almost perfect accuracy for numerical data, despite ongoing inference challenges.

Data Commons serves as a public resource offering a comprehensive knowledge graph with over 240 billion data points covering a variety of topics. The Google-created repository includes contributions from reputable organizations like the United Nations and World Health Organization, encompassing areas like health, economics, and environmental studies.

Paving the Way for Continued Advancement

Google hopes the introduction of DataGemma will inspire ongoing innovation in AI development, aiming to produce more reliable models. The company plans to refine these approaches and gradually integrate enhancements into its Gemma and Gemini models. Researchers can explore these advancements through quickstart notebooks available for both RIG and RAG methodologies. More information can be accessed on the Data Commons and associated research platforms.

SourceGoogle
Luke Jones
Luke Jones
Luke has been writing about Microsoft and the wider tech industry for over 10 years. With a degree in creative and professional writing, Luke looks for the interesting spin when covering AI, Windows, Xbox, and more.

Recent News

Mastodon