Fourteen LLMs fight it out in Street Fighter III — AI showdown finds out which models make the best street fighters.

April 5, 2024

A new artificial intelligence (AI) benchmark based on the classic arcade title Street Fighter III was devised at the Mistral AI hackathon in San Francisco last week. The open-source LLM Colosseum benchmark was developed by Stan Girard and Quivr Brain. The game is running in an emulator, allows the LLMs to duke it out in unconventional yet spectacular fashion.

AI enthusiast Matthew Berman introduces the new beat-em-up-based large language model (LLM) tournament in the video embedded above. In addition to showcasing the street fighting action, Berman’s video walks you through installing this open-source project on a home PC or Mac, so you can test it for yourself.

Fourteen LLMs fight it out in Street Fighter III — AI showdown finds out which models make the best street fighters. — (Image credit: OpenGenerativeAI team)

This isn’t a typical LLM benchmark. Smaller models usually have a latency and speed advantage, which translates to winning more bouts in this game. Human beat-em-ups players benefit from fast reactions to counter moves by their opponents, and the same rings true in this AI-vs-AI action.

The LLMs are making real-time decisions as to how they fight. As text-based models, they have been prompted how to react to the game action after first analyzing the game state for context and then considering their move options. Move options include; move closer, move away, fireball, megapunch, hurricane, and megafireball.

Ai Street Fighter III — (Image credit: OpenGenerativeAI team)

In the video you can see that fights look fluid, and the players appear to be strategic with their countering, blocking, and use of special moves. However, at the time of writing the project only allows the use of the Ken character – which provides perfect balance, but might be less interesting to watch.

So, which is the best Street Fighter III AI? According to the tests undertaken by Girard, OpenAI’s GPT 3.5 Turbo is the appropriately named winner (ELO 1776) from the eight LLMs they pitted against each other. In a separate series of tests, by Amazon exec Banjo Obayomi, we saw 14 LLMs sparring across 314 individual matches with Anthropic’s claude_3_haiku ultimately triumphant (ELO 1613).

Interestingly, Banjo also observed that LLM bugs/features like AI hallucinations and AI safety rails sometimes got in the way of a particular model’s beat-em-up performance.

Last but not least, the question arises whether this is a useful benchmark for LLMs, or just an interesting distraction. More complex games could provide more rewarding insights, but results would probably be more difficult to interpret.

Source link

Fourteen LLMs fight it out in Street Fighter III — AI showdown finds out which models make the best street fighters.

EDITOR PICKS

T20 World Cup 2024, ENG vs SA 45th Match, Super Eights, Group 2 Match...

Investec Champions Cup: Saracens’ Owen Farrell out of Bordeaux tie with hamstring injury

Telangana Revenue dept receives 60,000 more grievances pertaining to Dharani portal

Bollywood actors Prateik Babbar & Priya Baneerjee Get Married

Vishnupriya in a blue Saree

Rishabh Pant turns ‘khabri’ to know India A’s plans – Watch | Cricket News

Tesla Model 3 price in India; launch date, Elon Musk

New 2025 Bajaj Dominar In The Works

Know How Much You Should Eat Per Day

India Has Navigated Global Turbulence With Resilience Says RBI Governor

CM Revanth Reddy Directs Strict Arrangements for Miss World 2025 in Hyderabad

A shock to the Kiwis even before the Pak war, Matt Henry is out...

Ajay Devgan’s Maidaan Hindi Movie Review, Ajay Devgan, Priyamani

Apple TV+ Announces Free Weekend

EVEN MORE NEWS

Hyderabad’s Namita Kulshrestha crowned Mrs. India 2025

Suravaram Sudhakar Reddy: An orator par excellence who led student, youth...

Hyderabad’s Nehru Outer Ring Road to become 100 MW solar corridor

POPULAR CATEGORY