Microsoft's researchers are working on a new method to train small language models that can outperform large language models in conversational tasks. The team has trained a 6-billion parameter model that can generate more engaging and diverse responses than GPT-4, a 175-billion parameter model from OpenAI that underpins the ChatGPT chatbot.
Microsoft's Phi-1 1.3B is a transformer-based language model that excels at coding tasks. It surpasses bigger models such as HumanEval, MBPP, and ChatGPT with its artificial intelligence. It learned from high-quality sources: The Stack and The StackOverflow datasets, which gave it “textbook quality” training data.
Using eight NVIDIA A1000 GPUs for four days, Phi-1 1.3B produced 13 billion tokens that matched or improved the standards of GPT-4 and GPT 3.5 classifiers. Phi-1 1.3B is not only more intelligent, but also more efficient, as it uses fewer parameters than its competitors.
Language models are systems that learn from large amounts of text data and can produce text for different purposes, such as chatbots, summarization, translation, and more. However, training large language models requires a lot of computing power and energy, which makes them hard to use and scale.
New Training Methods to Boost Accuracy
To solve this problem, Microsoft's researchers have developed a new way to train small language models that can achieve similar or better results than large ones in conversational tasks. The way involves using a technique called knowledge distillation, which teaches a small model from a large model. The researchers also use a new strategy called curriculum learning, which makes the training data harder for the small model over time.
The researchers have tested their method on two datasets of conversations and found that their 6-billion parameter model can produce better responses than ChatGPT and other models. They have also asked humans to compare the models and found that the 6-billion parameter model is better than ChatGPT in terms of how fluent, consistent, informative, and high-quality the responses are.
The researchers think that their method can help create more efficient and effective chatbots that can help users and developers. They also hope that their work can encourage more research on improving small language models for generating text.
Baidu Also Claims to Have Surpassed ChatGPT
Yesterday I reported on the latest release of Baidu's Ernie chatbot, which the company claims surpasses ChatGPT in several key areas. The new system, called Ernie 3.5, is based on Baidu's previous research on natural language understanding and generation. It uses a novel technique called “continual pre-training” to learn from large-scale conversational data and adapt to different domains and scenarios.
According to Baidu, Ernie 3.5 achieved better results than ChatGPT on several metrics, such as coherence, diversity and informativeness, in both Chinese and English. It also outperformed ChatGPT in human evaluations, where users rated the quality and fluency of the generated responses.
Baidu plans to release Ernie 3.5 as an open-source project and provide access to its pre-trained models and datasets. It also intends to integrate Ernie 3.5 into its own products and services, such as its smart speaker Xiaodu and its online education platform Zuoyebang.