HomeWinBuzzer NewsBing Chat Gets More Responsive with New Backend

Bing Chat Gets More Responsive with New Backend

Bing Chat has been improved with a new backend that reduces latency by 25%. This makes the chatbot more responsive and reliable


, 's AI-powered chatbot, has recently undergone some major improvements. These changes have resulted in a 25% reduction in latency for some queries, making the experience more responsive and user-friendly. The improvements were made to the technology that underpins Bing Chat, Microsoft's engine. The new backend is more efficient and uses less resources, which has led to latency reductions.

The changes to Bing Chat were announced on Twitter by , the CEO of Bing. He said that the improvements are “a completely reworked backend for inner monologue, reducing time to first token by ~25%, and, far more importantly, making latency more stable, reducing spikes.”

Michael Schechter, a product manager at Bing, also commented on the changes. He said that the latency improvements “represent a ton of work and a significant improvement to the overall experience.”

How Bing Chat Delivers AI Search

Microsoft uses its own technology such as as well as the GPT large language model from in Bing Chat. In March, this included an upgrade to GPT-4. Bing Chat was in development for several years and adding GPT capabilities in 2022 accelerated the project. Microsoft created its Prometheus platform to underpin the experience.

So, essentially Prometheus is Bing search combined with natural language processing. Jordi Ribas, head of engineering for Bing, points out that the combination allows the chatbot to be more accurate:

“Thanks to the Bing grounding technique, Prometheus is also able to integrate citations into sentences in the Chat answer so that users can easily click to access those sources and verify the information. Sending traffic to these sources is important for a healthy web ecosystem and remains one of our top Bing goals.”

Bing Chat: What's Under the Hood

Bing Chat uses a hybrid approach of rule-based and neural components to handle different types of user requests. For example, if a user is seeking information, Bing will perform web searches and provide factual statements with references and links.

If a user is looking for creative content, such as poems, stories, code, essays, songs or celebrity parodies, Bing will generate it using its own words and knowledge. If a user needs assistance with rewriting, improving or optimizing their content, Bing Chat will help them with that as well. If a user wants to have some fun or learn something new, Bing Chat will offer jokes, trivia, games or educational content.

To provide these capabilities, Bing Chat leverages Azure Cognitive Services, such as LUIS, QnA Maker, Text Analytics and Speech Services. These services enable Bing Chat to understand the user's intent and context, answer their questions, analyze their sentiment and recognize their speech. Bing Chat also uses Azure Machine Learning and Azure Databricks to train and deploy custom models for content generation, summarization, paraphrasing and rewriting tasks.

To improve its scalability, reliability and performance, Bing Chat has adopted a microservices architecture and a serverless computing paradigm. It uses Azure Functions, Azure Service Bus, Azure Event Grid and Azure Cosmos DB to orchestrate the flow of data and requests among different services and components.

SourceBing Blog
Luke Jones
Luke Jones
Luke has been writing about all things tech for more than five years. He is following Microsoft closely to bring you the latest news about Windows, Office, Azure, Skype, HoloLens and all the rest of their products.

Recent News