Bing Chat, Microsoft's AI-powered chatbot, has recently undergone some major improvements. These changes have resulted in a 25% reduction in latency for some queries, making the experience more responsive and user-friendly. The improvements were made to the technology that underpins Bing Chat, Microsoft's AI search engine. The new backend is more efficient and uses less resources, which has led to latency reductions.
The changes to Bing Chat were announced on Twitter by Mikhail Parakhin, the CEO of Bing. He said that the improvements are “a completely reworked backend for inner monologue, reducing time to first token by ~25%, and, far more importantly, making latency more stable, reducing spikes.”
Michael Schechter, a product manager at Bing, also commented on the changes. He said that the latency improvements “represent a ton of work and a significant improvement to the overall experience.”
Fun fact: internally, we are most excited about something majority of people find boring. Yesterday we released a completely reworked backend for inner monologue, reducing time to first token by ~25%, and, far more importantly, making latency more stable, reducing spikes: pic.twitter.com/E0zBZ3lHyY
— Mikhail Parakhin (@MParakhin) June 29, 2023
How Bing Chat Delivers AI Search
Microsoft uses its own technology such as Microsoft Graph as well as the GPT large language model from OpenAI in Bing Chat. In March, this included an upgrade to GPT-4. Bing Chat was in development for several years and adding GPT capabilities in 2022 accelerated the project. Microsoft created its Prometheus platform to underpin the experience.
So, essentially Prometheus is Bing search combined with ChatGPT natural language processing. Jordi Ribas, head of engineering for Bing, points out that the combination allows the chatbot to be more accurate:
“Thanks to the Bing grounding technique, Prometheus is also able to integrate citations into sentences in the Chat answer so that users can easily click to access those sources and verify the information. Sending traffic to these sources is important for a healthy web ecosystem and remains one of our top Bing goals.”
Bing Chat: What's Under the Hood
Bing Chat uses a hybrid approach of rule-based and neural components to handle different types of user requests. For example, if a user is seeking information, Bing will perform web searches and provide factual statements with references and links.
If a user is looking for creative content, such as poems, stories, code, essays, songs or celebrity parodies, Bing will generate it using its own words and knowledge. If a user needs assistance with rewriting, improving or optimizing their content, Bing Chat will help them with that as well. If a user wants to have some fun or learn something new, Bing Chat will offer jokes, trivia, games or educational content.
To provide these capabilities, Bing Chat leverages Azure Cognitive Services, such as LUIS, QnA Maker, Text Analytics and Speech Services. These services enable Bing Chat to understand the user's intent and context, answer their questions, analyze their sentiment and recognize their speech. Bing Chat also uses Azure Machine Learning and Azure Databricks to train and deploy custom models for content generation, summarization, paraphrasing and rewriting tasks.
To improve its scalability, reliability and performance, Bing Chat has adopted a microservices architecture and a serverless computing paradigm. It uses Azure Functions, Azure Service Bus, Azure Event Grid and Azure Cosmos DB to orchestrate the flow of data and requests among different services and components.