HomeWinBuzzer NewsOpenAI Introduces CriticGPT for Better AI Training

OpenAI Introduces CriticGPT for Better AI Training

CriticGPT aids human trainers in reviewing output, and spotting errors that humans might miss.

-

OpenAI has launched CriticGPT, a new AI training tool designed to support human trainers in refining AI systems. CriticGPT aims to enhance the credibility and intelligence of chatbots by assisting trainers in evaluating intricate outputs like software code.

CriticGPT aims to train more advanced AI models while ensuring their outputs are reliable and align with human values.

Leveraging Human Feedback in AI Training

With CriticGPT, OpenAI is building on the manual improvement process based on Reinforcement Learning from Human Feedback (RLHF), a method that integrates human input to adjust AI models. This approach ensures that AI generates coherent and accurate outputs.

With RLHF, human trainers evaluate the AI’s responses, which then helps an algorithm refine the model’s performance. Human feedback inconsistencies and the difficulty of evaluating complex outputs can hinder this process. Furthermore, AI models might prioritize producing convincing answers over accurate ones.

To overcome these obstacles, OpenAI developed CriticGPT, a specialized version of its GPT-4 model. This tool aids human trainers in reviewing code, spotting bugs that humans might miss. Tests showed that human judges preferred CriticGPT’s critiques 63 percent of the time. OpenAI aims to expand this technique to other areas beyond code review.

OpenAI writes that beyond code reviews, CriticGPT identified errors in 24 percent of ChatGPT training data that human annotators had previously deemed flawless. 
OpenAI criticGPT benchmark official

CriticGPT was trained on a dataset containing code with deliberately inserted bugs. This training enabled the model to detect various coding errors. Teams comprising both humans and CriticGPT delivered more comprehensive critiques and reduced inaccuracies compared to AI-only critiques. In their research paper for CriticGPT, OpenAI writes:

“These LLM critics now succeed in catching bugs in real-world data, and even accessible LLM baselines like ChatGPT have significant potential to assist human annotators. From this point on the intelligence of LLMs and LLM critics will only continue to improve. Human intelligence will not. It is therefore essential to find scalable methods that ensure that we reward the right behaviors in our AI systems even as they become much smarter than us. We find LLM critics to be a promising start.”

New Development Technique: FSBS

The creation of CriticGPT included a new method known as Force Sampling Beam Search (FSBS). This technique adjusts the critique thoroughness while managing false positives. The OpenAI researchers write:

“The critic model accepts a (question, answer) pair as input and generates a structured critique containing quotes from the answer and comments on potential issues. In the critique, quoted sections of the answer are presented as “highlights” using markdown code blocks starting with ““`”, followed by comments that identify errors within those highlights. In FSBS, we search for critiques by compelling the model to produce highlighted sections through constrained sampling, then we select the highest-scoring critiques based on the formula rm_score + LENGTH_MODIFIER * num_highlights.”

OpenAI admits that CriticGPT has limitations as it was trained on relatively short ChatGPT answers, which might not prepare it for more extensive, complex tasks. While it can reduce inaccuracies, it doesn’t eliminate them. CriticGPT excels at identifying errors in a specific part of the code, but real-world mistakes often span multiple sections of an answer.

They plan to integrate CriticGPT-like tools into its RLHF labeling workflow, offering AI-assisted support to trainers. However, extremely complex tasks may still pose challenges. This new Strategy is part of a broader initiative to perfect large language models and ensure their behavior remains acceptable as their capabilities grow.

AI Model Training Advancements

Competitors like Anthropic have also announced advancements in their AI models, including an upgraded version of their Claude chatbot thanks to improved training techniques and data inputs. Both companies are investigating new methods to monitor AI models to prevent undesired behaviors like deception.

Microsoft also is working on improved AI training approaches and recently rolled out a fresh technique designed to better align Large Language Models (LLMs) with human intentions, leveraging active preference elicitation. The new strategy aims to fine-tune the efficiency and precision of LLMs by maximizing their reward functions.

Nvidia is betting on leveraging synthetic data to improve AI model training. It has launched Nemotron-4 340B, a series of open models crafted to produce synthetic data for training large language models (LLMs). The initiative targets increasing demands for quality training data in fields such as healthcare, finance, manufacturing, and retail.

Last Updated on November 7, 2024 3:45 pm CET

SourceOpenAI
Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x
Mastodon