Researchers from the University of Connecticut and AIGCode have introduced AutoCoder, a new large language model (LLM) for code generation, which has achieved a 90.9% pass rate on the HumanEval benchmark, surpassing OpenAI´s GPT-4 Turbo’s 90.2%.
In comparing AutoCoder with GPT-4 Turbo and GPT-4o, AutoCoder leads with a superior pass@1 metric on the HumanEval benchmark, indicating higher coding precision and efficiency. AutoCoder’s ability to handle external packages gives it a notable edge over its predecessors, which are restricted to built-in packages.
AIEV-INSTRUCT: A New Training Approach
AutoCoder uses AIEV-INSTRUCT as a novel training strategy that enhances code quality while reducing dependency on substantial proprietary models, suggesting more sustainable and open advancements in LLM coding.
AIEV-INSTRUCT uses an interactive process employing a pair of agents—a questioner and a coder—to engage in simulated coding dialogues. Initially, proprietary models create and validate instructions, with GPT-4 Turbo as the supervisor. Through iterative interactions, the generated code undergoes continuous refinement. When the student model exceeds the teacher model in performance, it enters a self-learning phase, independently generating and verifying code. This innovative method minimizes reliance on expensive models, boosting both the quality and robustness of the datasets produced.
Performance and Versatility
Trained via AIEV-INSTRUCT, AutoCoder has shown exceptional performance, not only surpassing GPT-4 Turbo on the HumanEval benchmark but also demonstrating significant prowess in code interpretation, including the installation of external packages. This capability greatly broadens AutoCoder’s utility in practical coding environments. AutoCoder has been evaluated across multiple datasets, including HumanEval+, MBPP, MBPP+, MultiPL-E, and DS-1000, where it has secured top positions in various benchmarks. Even the smaller AutoCoder-S version, with 6.7 billion parameters, has performed remarkably well, proving effective and accurate despite a reduced parameter count.
AutoCoder has the potential to significantly enhance software development. Its superior performance indicates a more accessible and accurate tool for developers worldwide. The underlying study presents a cost-effective, precise method for generating code instruction datasets, improving the overall efficiency of code generation tasks. For more details, the research paper is available on arXiv, and the code is accessible online via GitHub.
Last Updated on November 7, 2024 7:57 pm CET