HomeWinBuzzer NewsElon Musk's xAI Enhances Grok with Multimodal Image and Text Capabilities

Elon Musk’s xAI Enhances Grok with Multimodal Image and Text Capabilities

The Grok model from xAI is moving into multimodal capabilities to compete with OpenAI's recently launched GPT-4o.


Elon Musk's artificial intelligence venture, xAI, is making strides in enhancing its Grok chatbot with multimodal capabilities. According to the latest developer documents, users will soon be able to upload images to Grok and receive text-based responses.

The developer documents include a sample Python script that outlines the integration process. This script demonstrates how developers can use the xAI kit (SDK) to process both text and image inputs. The script details steps to read an image file, set up a text prompt, and generate a response using the xAI SDK, indicating a move towards more sophisticated interaction methods.

Evolution and Previous Version

Grok was first made available in November 2023 and is accessible to subscribers of the X Premium Plus service. The latest iteration, Grok 1.5, was released in March, featuring improved reasoning capabilities. The model is trained on a diverse range of text data from the internet up to the third quarter of 2023, supplemented by datasets curated by human reviewers. Notably, while Grok-1 was not trained on data from X (formerly Twitter), it does have real-time access to public posts on the platform.

Competitive Position and Future Developments

Founded by in March 2023, xAI is a relatively new player in the AI sector, competing with established entities like OpenAI's ChatGPT. Despite its newcomer status, xAI asserts that Grok 1.5 is narrowing the performance gap with OpenAI's GPT-4 across various benchmarks, including academic competition problems. However, it is important to consider that benchmarks for large models often face scrutiny for potentially including test data in their training sets, which can affect performance outcomes.

Multimodal Models in Various Domains

A blog post from last month indicated that Grok-1.5V will offer “multimodal models in a number of domains.” The recent update to the developer documents suggests progress towards releasing a new model. This model is trained on a variety of text data from publicly available internet sources up to Q3 2023 and datasets reviewed by human experts. Grok also boasts real-time knowledge of the world, including posts on X.

The development of multimodal conversational is viewed as a significant advancement in AI technology. With recent announcements from I/O and the release of OpenAI's GPT-4o, Grok's previous lack of multimodal capabilities had placed it at a competitive disadvantage. The ongoing updates aim to bridge this gap and enhance Grok's functionality.

Luke Jones
Luke Jones
Luke has been writing about all things tech for more than five years. He is following Microsoft closely to bring you the latest news about Windows, Office, Azure, Skype, HoloLens and all the rest of their products.

Recent News