HomeWinBuzzer NewsElon Musk's xAI Enhances Grok with Multimodal Image and Text Capabilities

Elon Musk’s xAI Enhances Grok with Multimodal Image and Text Capabilities

The Grok model from xAI is moving into multimodal capabilities to compete with OpenAI's recently launched GPT-4o.

-

Elon Musk’s artificial intelligence venture, xAI, is making strides in enhancing its Grok chatbot with multimodal capabilities. According to the latest developer documents, users will soon be able to upload images to Grok and receive text-based responses.

The developer documents include a sample Python script that outlines the integration process. This script demonstrates how developers can use the xAI software development kit (SDK) to process both text and image inputs. The script details steps to read an image file, set up a text prompt, and generate a response using the xAI SDK, indicating a move towards more sophisticated interaction methods.

Evolution and Previous Version

Grok was first made available in November 2023 and is accessible to subscribers of the X Premium Plus service. The latest iteration, Grok 1.5, was released in March, featuring improved reasoning capabilities. The model is trained on a diverse range of text data from the internet up to the third quarter of 2023, supplemented by datasets curated by human reviewers. Notably, while Grok-1 was not trained on data from X (formerly Twitter), it does have real-time access to public posts on the platform.

Competitive Position and Future Developments

Founded by Elon Musk in March 2023, xAI is a relatively new player in the AI sector, competing with established entities like OpenAI’s ChatGPT. Despite its newcomer status, xAI asserts that Grok 1.5 is narrowing the performance gap with OpenAI’s GPT-4 across various benchmarks, including academic competition problems. However, it is important to consider that benchmarks for large language models often face scrutiny for potentially including test data in their training sets, which can affect performance outcomes.

Multimodal Models in Various Domains

A blog post from last month indicated that Grok-1.5V will offer “multimodal models in a number of domains.” The recent update to the developer documents suggests progress towards releasing a new model. This model is trained on a variety of text data from publicly available internet sources up to Q3 2023 and datasets reviewed by human experts. Grok also boasts real-time knowledge of the world, including posts on X.

The development of multimodal conversational chatbots is viewed as a significant advancement in AI technology. With recent announcements from Google I/O and the release of OpenAI’s GPT-4o, Grok’s previous lack of multimodal capabilities had placed it at a competitive disadvantage. The ongoing updates aim to bridge this gap and enhance Grok’s functionality.

Last Updated on November 7, 2024 8:13 pm CET

SourceX.ai
Luke Jones
Luke Jones
Luke has been writing about Microsoft and the wider tech industry for over 10 years. With a degree in creative and professional writing, Luke looks for the interesting spin when covering AI, Windows, Xbox, and more.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x
Mastodon