HomeWinBuzzer NewsCohere for AI Introduces Aya 23 Multilingual Language Models

Cohere for AI Introduces Aya 23 Multilingual Language Models

Aya 2023 is available in 8 billion and 35 billion parameter versions and was developed through contributions from over 3,000 independent researchers.


Cohere for AI, the research division of the Canadian company Cohere, has unveiled Aya 23, a new series of multilingual language models. These new models, available in 8 billion and 35 billion parameter versions, aim to enhance the understanding and generation of human language across a diverse range of languages. The models' open weights are now accessible, allowing researchers to tailor them to their specific needs.

Multilingual Capabilities and Dataset

Aya 23 models extend support to 23 languages, including Arabic, Chinese, French, German, and Japanese, among others. This broad linguistic range marks a departure from earlier models that primarily focused on English. The models were developed using the Aya Collection, a dataset comprising 513 million instances of prompts and completions, which was crucial in fine-tuning the models for high-quality responses across various languages.

The creation of Aya 23 involved contributions from over 3,000 independent researchers across 119 countries, underscoring the collaborative nature of the project. This extensive participation helped ensure that the models are robust and versatile, capable of handling a wide array of linguistic nuances and contexts.

Performance and Technical Specifications

Technical evaluations reveal that the 35 billion parameter variant of Aya 23, known as Aya-23-35B, excels in both discriminative and generative tasks. It has shown improvements of up to 14% on discriminative tasks and 20% on generative tasks compared to its predecessor, Aya 101. Additionally, it achieved a 41.6% increase in multilingual MMLU performance.

Aya-23-35B employs a decoder-only Transformer architecture, which enhances the model's ability to generate accurate outputs by analyzing the context of words in user prompts. This model also incorporates grouped query attention to optimize RAM usage and improve inference speed. Furthermore, rotational positional embeddings are used to better process the positional information of words within a sentence, thereby enhancing output quality.

Accessibility and Licensing

The open weights of the Aya 23 models are available on Hugging Face under the Creative Commons Attribution-NonCommercial 4.0 International Public License. This choice ensures that the broader research community can engage with and build upon Cohere for AI's work. Additionally, the models can be explored through the Cohere Playground, which offers free access to these advanced multilingual models.

Cohere Inc., headquartered in Toronto, has raised over $400 million from investors such as Corp. and Oracle Corp. The company specializes in large language models designed for enterprise applications. Apart from the Aya series, Cohere also offers a neural network called Embed, which transforms data into mathematical structures that are more comprehensible for language models.

Prior to Aya 23, Cohere released Aya-101, a model capable of understanding 101 languages. However, the new Aya-23-35B has demonstrated superior performance in internal evaluations and multilingual text processing tasks compared to other open-source large language models (LLMs).

Luke Jones
Luke Jones
Luke has been writing about all things tech for more than five years. He is following Microsoft closely to bring you the latest news about Windows, Office, Azure, Skype, HoloLens and all the rest of their products.