Cohere for AI, the research division of the Canadian company Cohere, has unveiled Aya 23, a new series of multilingual language models. These new models, available in 8 billion and 35 billion parameter versions, aim to enhance the understanding and generation of human language across a diverse range of languages. The models’ open weights are now accessible, allowing researchers to tailor them to their specific needs.
Multilingual Capabilities and Dataset
Aya 23 models extend support to 23 languages, including Arabic, Chinese, French, German, and Japanese, among others. This broad linguistic range marks a departure from earlier models that primarily focused on English. The models were developed using the Aya Collection, a dataset comprising 513 million instances of prompts and completions, which was crucial in fine-tuning the models for high-quality responses across various languages.
The creation of Aya 23 involved contributions from over 3,000 independent researchers across 119 countries, underscoring the collaborative nature of the project. This extensive participation helped ensure that the models are robust and versatile, capable of handling a wide array of linguistic nuances and contexts.
Performance and Technical Specifications
Technical evaluations reveal that the 35 billion parameter variant of Aya 23, known as Aya-23-35B, excels in both discriminative and generative tasks. It has shown improvements of up to 14% on discriminative tasks and 20% on generative tasks compared to its predecessor, Aya 101. Additionally, it achieved a 41.6% increase in multilingual MMLU performance.
Aya-23-35B employs a decoder-only Transformer architecture, which enhances the model’s ability to generate accurate outputs by analyzing the context of words in user prompts. This model also incorporates grouped query attention to optimize RAM usage and improve inference speed. Furthermore, rotational positional embeddings are used to better process the positional information of words within a sentence, thereby enhancing output quality.
Accessibility and Licensing
The open weights of the Aya 23 models are available on Hugging Face under the Creative Commons Attribution-NonCommercial 4.0 International Public License. This licensing choice ensures that the broader research community can engage with and build upon Cohere for AI’s work. Additionally, the models can be explored through the Cohere Playground, which offers free access to these advanced multilingual models.
Cohere Inc., headquartered in Toronto, has raised over $400 million from investors such as Nvidia Corp. and Oracle Corp. The company specializes in large language models designed for enterprise applications. Apart from the Aya series, Cohere also offers a neural network called Embed, which transforms data into mathematical structures that are more comprehensible for language models.
Prior to Aya 23, Cohere released Aya-101, a model capable of understanding 101 languages. However, the new Aya-23-35B has demonstrated superior performance in internal evaluations and multilingual text processing tasks compared to other open-source large language models (LLMs).
Last Updated on November 7, 2024 8:09 pm CET