OpenAI has developed a large language model (LLM) that uses a chain of thought like humans. In a paper, the group revealed a new method for improving mathematical reasoning with process supervision. OpenAI has applied process supervision to train large language models (LLMs) that can generate solutions to mathematical problems.
One of the challenges of training LLMs is that they can sometimes hallucinate, or make up information that is not supported by the data they were trained on. This can be a problem for applications that rely on the LLM to provide accurate information, such as medical diagnosis or financial advice.
To address this problem, OpenAI developed two different training methods for generative AI such as ChatGPT or the GPT-4 LLM. The first method, called outcome supervision, trains the model to generate text that produces a desired outcome. For example, the model might be trained to generate text that summarizes a factual topic or that solves a math problem.
The second method, called process supervision, trains the model to generate text that follows a specific chain of thought. For example, the model might be trained to generate text that explains how to solve a math problem step-by-step.
Training LLMs to be More Accurate
To train LLMs with process supervision, OpenAI has created a new dataset called MathStep, which contains over 1 million mathematical problems with detailed solutions and hints. The problems cover topics such as algebra, calculus, geometry, number theory, and combinatorics. The solutions and hints are written in natural language and LaTeX, which are formats that LLMs can understand and generate.
OpenAI has tested its method on several benchmarks that measure the mathematical reasoning skills of LLMs, such as MathQA, Algebra Word Problems, and MathSAT. The results show that LLMs trained with process supervision outperform LLMs trained with traditional supervision by a large margin. Moreover, LLMs trained with process supervision can generate more accurate and complete solutions, as well as explain their reasoning in natural language.
According to OpenAI, process supervision is a general technique that can be applied to other domains and tasks that require reasoning and problem-solving skills. The organization also hopes that process supervision will inspire more research on how to teach and evaluate LLMs and LLM systems in a safe and effective way.