Microsoft has launched three new models under the Phi-3.5 series to enhance its AI offerings. These models include Phi-3.5-mini-instruct, Phi-3.5-MoE-instruct, and Phi-3.5-vision-instruct, targeting various tasks ranging from basic reasoning to advanced image and video analysis.
Phi-3.5 Mini Instruct: Compact Performance
Designed for settings with limited computational capabilities, the 3.8 billion parameter Phi-3.5 Mini Instruct model shines in tasks such as code generation, mathematical problem-solving, and logical reasoning. It supports a 128k token context length and outperforms other similarly-sized models like Llama-3.1-8B-instruct and Mistral-7B-instruct on benchmarks like RepoQA. Trained on 3.4 trillion tokens using 512 H100 GPUs, it sets new standards in long-context code understanding.
Phi-3.5 MoE: Mixture of Experts
Phi-3.5 MoE is distinguished by its architecture that merges multiple specialized models into a single system. With 42 billion active parameters, it operates with only 6.6 billion active parameters during generation, offering scalability and efficiency in demanding tasks. The model supports 128k token context length and excels in reasoning tasks including code, math, and multilingual language understanding, consistently outperforming larger models in benchmarks such as RepoQA and the 5-shot MMLU. The training involved 4.9 trillion tokens using 512 H100 GPUs.
Phi-3.5 Vision Instruct: Multimodal Expertise
Combining text and image processing, the Phi-3.5 Vision Instruct model is adept at tasks such as image understanding, optical character recognition, chart and table comprehension, and video summarization. It supports a 128k token context length, making it suitable for complex, multi-frame visual tasks. Trained on 500 billion tokens with 256 A100 GPUs, this model shows improved performance on metrics like MMMU (40.2 to 43.0), MMBench (80.5 to 81.9), and TextVQA (70.9 to 72.0).
Open-Source and Licensing
All three Phi-3.5 models are available on Hugging Face under an MIT license, broadening their accessibility for various applications. Microsoft plans to provide further information on the Phi-3.5 models later today.
The Phi-3.5 models have been rigorously trained to achieve high performance. The Phi-3.5-mini model was trained on 3.4 trillion tokens, whereas the Phi-3.5-MoE model was trained on 4.9 trillion tokens. The Phi-3.5-vision model underwent training on 500 billion tokens, displaying marked improvements in multimodal reasoning benchmark tests.
These models are expansion of the Phi-3 family, which launched earlier this year. Microsoft's Phi-3 family includes mini, small, and medium models, each designed to provide AI capabilities on a smaller and more efficient scale. These models are particularly suitable for low-power devices with limited processing power. Intel has showcased the performance of its products across all these models, demonstrating the versatility and efficiency of its hardware.