Microsoft has developed a new method to enhance the performance of OpenAI's GPT-4 in language understanding benchmarks. This advancement arrives amidst the benchmarking success of Google's Gemini Ultra model. The Microsoft Research team, under the guidance of Chief Science Officer Eric Horvitz, Research Engineering Director Harsha Nori, and Principal Researcher Yin Tat Lee, has revealed that this novel prompting technique, known as Medprompt+, allows GPT-4 to reach unprecedented heights in the Measuring Massive Multitasking Language Understanding (MMLU) evaluations.
Innovative Prompting Technique
While Google's multimodal artificial intelligence model Gemini Ultra originally outperformed GPT-4 on several benchmarks, Microsoft's Medprompt+ has tipped the scales back in favor of GPT-4. The new technique involves an improved method of steering GPT-4 with prompts that result in a higher quality of responses. Specifically, Medprompt+ enhances the original approach by integrating simple prompts alongside more complex ones and formulating a policy for integrating outputs, establishing a superior evaluation method. This tailored approach combines both base and inferred confidences of candidate answers to formulate the most accurate response.
The Future of AI Benchmarks
Despite Microsoft Research's significant leap in optimizing prompt-based methods, the implementation of such advanced techniques is currently limited to internal applications and not available for the general public. The company underscores the importance of improving the out-the-box experience that users with different levels of tech proficiency will encounter.
Microsoft pledges to continue research on zero- or few-shot prompting strategies, which are essential for the intuitive use of language models. Further details about the new prompting techniques and their applications are accessible through Microsoft Research's GitHub repository, offering the wider AI research community a glimpse into the ongoing developments and innovations in the field.