HomeWinBuzzer NewsOpenAI GPT-4 Capabilities Extended to Visual Input: A DOOM Game Test Case

OpenAI GPT-4 Capabilities Extended to Visual Input: A DOOM Game Test Case

Researcher uses AI (GPT-4) to play DOOM by feeding it screenshots and translating its actions. AI can fight and explore, but forgets unseen enemies.


Adrian de Wynter, a principal applied scientist at and a researcher at the University of York, has spearheaded an inquiry into the capability of GPT-4, a large language model developed by Microsoft-backed OpenAI, to interact with and play the iconic game DOOM. While was not designed to execute games or their code, de Wynter's research has found that through innovative engineering, it can effectively serve as a game proxy. The study, titled “Will GPT-4 Run DOOM?”, reveals that although GPT-4 cannot directly run DOOM's source code due to limitations around input size, its multimodal variant, GPT-4V, which accepts both text and visual inputs, demonstrates a unique ability to interact with the game.

Technical Implementation

The research takes advantage of GPT-4V's capacity to process images as inputs, alongside traditional text, to navigate the game environment of DOOM. De Wynter constructed a system where GPT-4V receives screenshots of the game, interprets these visuals to understand the game state, and responds with action decisions. These decisions are then translated into keystroke commands compatible with the game engine. This setup involves a complex interplay between the vision component (GPT-4V), the agent model (GPT-4), and a manager layer that interfaces directly with the game engine via an open-source binding. Despite GPT-4's limitations, such as a lack of object permanence leading it to forget about enemies once they leave the screen, this innovative approach allows the AI to execute game-related actions like opening doors, engaging in combat, and navigating levels.

Ethical Considerations and Applications

The experiment raises significant ethical questions, particularly around the ease with which AI can be instructed to perform potentially violent actions within a game context, even without specific training for such activities. De Wynter emphasizes the importance of considering the societal implications and potential misuse of AI capabilities that can simulate behavior in video games and possibly beyond. While the research primarily aims at exploring AI's planning and reasoning abilities in a controlled environment, it also underscores the need for a cautious approach to and deployment.

Luke Jones
Luke Jones
Luke has been writing about all things tech for more than five years. He is following Microsoft closely to bring you the latest news about Windows, Office, Azure, Skype, HoloLens and all the rest of their products.

Recent News