If you want to make sure your AI is good, you only need one video game: Pokémon

The Pokémon franchise, which has sold over 489 million video games since its launch in 1996, has evolved into a global phenomenon that encompasses not only games but also a successful animated series and a famous card collection. Almost 30 years after its inception, Pokémon continues to release titles that captivate generations of players around the world.

If AI can’t play Pokémon, it’s not good

Recently, classic Pokémon games have found a new role in artificial intelligence (AI) research. Despite being considered a challenge, these games are used as assessment tools to test the capabilities of AI systems. Researchers have begun to test models like Claude and Gemini on popular Pokémon titles such as Pokémon Red and Blue, revealing significant limitations in planning and execution within these dynamic environments.

Research has shown that AI models like the original Claude often get stuck without a clear direction, highlighting their weaknesses in long-term reasoning. In contrast, more advanced versions, such as Gemini 2.5 Pro, managed to complete Pokémon Blue, although with a notably slower performance compared to an average human player. This disparity highlights the obstacles that AI still faces in making complex decisions and managing multiple tasks.

The popularity of Pokémon as a testing ground for AI is due to its controlled environment, which allows researchers to quantitatively measure the capabilities of machines in areas where they are deficient. David Hershey, a researcher at Anthropic, has pointed out that Pokémon provides a rich assessment of AI performance, which explains its growing use in this field. It is intriguing how a game that has entertained millions becomes a vital tool for advancing artificial intelligence research.

ChatGPT DOWNLOAD