Anthropic used Pokémon to benchmark its newest AI model

Anthropic used Pokémon to compare his latest AI model. Yes, really.

In the blog publish Posted on Monday, Anthropopi said he tested his latest model, Claude 3.7 sonnetto Game Boy Classic Pokémon Red. The company equipped the model with basic memory, entering pixels of screen and pressing calls and moved around the screen, allowing it to be continuously reproduced by Pokémon.

The unique feature of Claude 3.7 sonnet is its ability to engage in “expanded thinking”. Like Openi’s O3-Dom and Deepseek R1, Claude 3.7 Sonnet can “distinguish” through challenging problems by applying more calculations and taking more time.

That was welcome to Pokémon Red.

Compared to the previous version of Claude, Claude 3.0 sonnet, who failed to leave the house in the city of the palette where the story begins, Claude 3.7 Sonnet successfully fought with three Pokémon gym leaders and won their badges.

Anthropic pokemon red — **Meritages for Pictures:**Anthropically

Now it is not clear how much calculation it took for Claude 3.7 sonnet to reach these turning point – and how long it took. Anthropic just said that the model performed 35,000 actions to reach the last gym leader, Surge.

It will certainly not take much time before some entrepreneurial programmer finds out.

Pokémon Red is more a measure of toys than anything. However, there is Long history games used for the purpose of AI comparison. Only in the last few months, numerous new apps and platforms have intensified to test the games of game games on the range of ranges from Street fighter to Piconimar.

Source link

Leave a ReplyCancel Reply