OpenAI GPT-4 Capabilities Extended to Visual Input: A DOOM Game Test Case

Adrian de Wynter, a principal applied scientist at Microsoft and a researcher at the University of York, has spearheaded an inquiry into the capability of GPT-4, a large language model developed by Microsoft-backed OpenAI, to interact with and play the iconic game DOOM. While GPT-4 was not designed to execute games or their code, de Wynter's research has found that through innovative engineering, it can effectively serve as a game proxy. The study, titled “Will GPT-4 Run DOOM?”, reveals that although GPT-4 cannot directly run DOOM's source code due to limitations around input size, its multimodal variant, GPT-4V, which accepts both text and visual inputs, demonstrates a unique ability to interact with the game.

Technical Implementation

The research takes advantage of GPT-4V's capacity to process images as inputs, alongside traditional text, to navigate the game environment of DOOM. De Wynter constructed a system where GPT-4V receives screenshots of the game, interprets these visuals to understand the game state, and responds with action decisions. These decisions are then translated into keystroke commands compatible with the game engine. This setup involves a complex interplay between the vision component (GPT-4V), the agent model (GPT-4), and a manager layer that interfaces directly with the game engine via an open-source Python binding. Despite GPT-4's limitations, such as a lack of object permanence leading it to forget about enemies once they leave the screen, this innovative approach allows the AI to execute game-related actions like opening doors, engaging in combat, and navigating levels.

Ethical Considerations and Applications

The experiment raises significant ethical questions, particularly around the ease with which AI can be instructed to perform potentially violent actions within a game context, even without specific training for such activities. De Wynter emphasizes the importance of considering the societal implications and potential misuse of AI capabilities that can simulate behavior in video games and possibly beyond. While the research primarily aims at exploring AI's planning and reasoning abilities in a controlled environment, it also underscores the need for a cautious approach to AI development and deployment.

OpenAI GPT-4 Capabilities Extended to Visual Input: A DOOM Game Test Case

Technical Implementation

Ethical Considerations and Applications

Recent News

Reddit Launches Dynamic Product Ads in Global Public Beta

Google Announces Direct Microsoft 365 App Access on ChromeOS