Google AI has unveiled a novel machine learning framework named AGREE, aimed at reducing inaccuracies in large language models (LLMs). These inaccuracies, often referred to as “hallucinations,” occur when LLMs produce responses that are incorrect or nonsensical, especially in contexts requiring extensive world knowledge.
Addressing Hallucination Challenges
The phenomenon of hallucinations is particularly problematic in domains like news reporting and education, where factual accuracy is paramount. Traditional methods to mitigate these errors include post-hoc citing and prompting-based grounding. Post-hoc citing involves adding citations after generating responses, but this approach is limited by the LLM's existing knowledge base. Prompting-based grounding, which relies on the model's instruction-following capabilities, often fails to meet the high standards of factual accuracy required in real-world applications.
The AGREE Framework
AGREE, which stands for Adaptation for Grounding Enhancement, introduces a learning-based framework that allows LLMs to self-ground their responses and provide accurate citations. During its training phase, AGREE fine-tunes LLMs using synthetic data from unlabeled queries. This process enables the models to self-ground their claims by adding citations to their responses. At test time, AGREE employs an iterative inference strategy, allowing LLMs to seek additional information based on self-generated citations, thereby refining their answers continuously.
The training process for AGREE involves several steps. Initially, synthetic data is collected from unlabeled queries, and relevant passages from reliable sources such as Wikipedia are retrieved using a retriever model. These passages are then presented to the base LLM, which generates initial responses without citations. An NLI model is used to determine the support for each claim, adding citations to the supporting passages accordingly. Sentences without supporting passages do not receive citations.
Effectiveness and Robustness
Experiments conducted across five datasets have shown that AGREE significantly improves grounding and citation precision compared to baseline methods. The framework has demonstrated over 30% relative improvements in grounding quality. AGREE's robustness is evident as it performs well even with out-of-domain data, indicating its versatility across different question types, including those requiring knowledge outside the model's training data. The inclusion of test-time adaptation (TTA) further enhances both grounding and answer correctness by allowing the LLM to actively seek more relevant passages to construct better answers.
Over recent years, LLMs have made significant strides in capabilities such as multi-hop reasoning, generating plans, and utilizing tools and APIs. However, the issue of hallucinations has remained a persistent challenge. AGREE's approach of combining learning-based adaptation with test-time adaptation offers a promising solution. By enabling LLMs to self-ground their responses and provide precise citations, AGREE increases user trust and expands the potential applications of LLMs in various fields requiring high factual accuracy.
Experimental Validation
AGREE's effectiveness was validated through comprehensive experiments using both in-domain and out-of-domain datasets. The tuning data was created using queries from datasets like Natural Questions, StrategyQA, and Fever, which provide diverse text and require different reasoning processes. AGREE adapts the base LLM using in-domain training sets and tests the model on out-of-domain datasets to evaluate its generalization capabilities. The results indicate that AGREE's improvements can generalize effectively to different question types and external knowledge sources.