Meta has launched pre-trained language models featuring multi-token prediction, a new technique in AI. These models are available on Hugging Face under a research license for non-commercial use and aim to push forward the capabilities of large language models (LLMs). The company made the announcement on its official Twitter/X page for the Meta artificial intelligence division.
Advancement in AI Methodology
The multi-token prediction method, first published in a research paper in April, signifies a shift from traditional AI training techniques. Unlike prior models that predict a single next word in a sequence, Meta’s new approach predicts multiple future words at once. This could improve performance and cut down on training times, potentially altering the future of AI technology.
In April we published a paper on a new training approach for better & faster LLMs using multi-token prediction. To enable further exploration by researchers, we’ve released pre-trained models for code completion using this approach on @HuggingFace ⬇️https://t.co/OnUsGcDpYx
— AI at Meta (@AIatMeta) July 3, 2024
These changes have broad implications. As AI models grow more complex, their increased computational demands have led to concerns about cost and environmental impact. Meta’s multi-token prediction could mitigate these issues, making advanced AI more practical and sustainable.
Better Language Comprehension
This method might also result in a deeper understanding of language, enhancing tasks like code generation and creative writing. By narrowing the gap between AI and human language comprehension, these models could have a significant influence on various applications.
However, the accessibility of these AI tools introduces concerns. While this could democratize AI research and benefit smaller organizations, it also opens up possibilities for misuse. The AI community must now create ethical frameworks and security measures to keep up with these technological advances.
Emphasis on Code Generation
Meta’s release of these models is in line with its commitment to open science. The initial focus is on code completion tasks, reflecting the increasing demand for AI-assisted programming tools. As software development increasingly incorporates AI, Meta’s contributions could speed up the trend towards collaborative human-AI coding.
Meta has open-sourced four language models, each with 7 billion parameters, aimed at code generation tasks. Two of the models were trained on 200 billion tokens of code, while the other two were trained on 1 trillion tokens. There’s also a fifth, yet-unreleased model featuring 13 billion parameters.
These models consist of two main components: a shared trunk and output heads. The shared trunk handles initial computations for generating a code snippet, while the output heads generate one token at a time.
Benchmark Testing and Results
Benchmark tests using MBPP (a set of around 1,000 Python coding tasks) and HumanEval (a more complex set of coding tasks across multiple languages) were conducted to measure the accuracy of Meta’s models. The models showed improvements of 17% and 12% on MBPP and HumanEval, respectively, compared to similar LLMs that generate tokens sequentially. Additionally, the output from Meta’s models was generated three times faster.
This release is part of Meta’s broader efforts in AI research, which also include developments in image-to-text generation and AI-generated speech detection. This extensive approach positions Meta as an important player across multiple AI fields, not just language models.
Critics argue that more efficient AI models might increase risks related to AI-generated misinformation and cyber threats. Meta has responded to these concerns by emphasizing that the models are licensed solely for research. However, questions about the effectiveness of these restrictions remain.
Last Updated on November 7, 2024 3:41 pm CET