Open source tool measures the stupidity level of AI models

AI models are not stable for vibe coding (Image source: Generated using OpenAI)

A new open-source tool is offering real-time monitoring of multiple AI models, including OpenAI GPT-5, Claude Opus 4, and Gemini 2.5 Pro. The first of its kind, it can detect "when AI companies reduce model capability to save costs." The benchmarks can run against the users' own OpenAI, xAI, Anthropic, or Google API keys as well.

Codrut Nistor, Published 09/17/2025 🇪🇸 🇵🇹 ...

Those who have worked with AI models for various tasks, especially coding, have noticed that the software tools behave inconsistently. In some cases, they simply fail to provide any answers; sometimes they deliver erroneous code, and when they come up with what was expected, they do it slower than usual. This is where the AI Benchmark Tool, located at AistupidLevel.info, steps in, providing real-time information regarding the performance and accuracy of several AI models, including cost data.

The aforementioned open-source tool runs over 140 coding, debugging, and optimization tasks on all large models. For now, it tracks the following: OpenAI GPT, Claude, and Gemini. Grok will be added soon as well. Its highlights include the following:

Real-time price information, since some models that seem cheap need 10 iterations to get a job done, while others that seem more expensive at first sight will accomplish the same task in 2 iterations, so for a lower effective cost.
The ability to run the same tests with your own API keys.
Real-time AI performance monitoring, including live model rankings based on stupidity and smartness.
Smart recommendations, based on combined performance.
Notification of active degradations—for example, Gemini-2.5-Flash is now 44% down compared to the baseline value.

Currently, the smart recommendations are these: Gemini-2.5-Flash-Lite for code, Claude-3.5-Sonnet-20241022 for reliability, and Gemini-2.5-Flash-Lite for speed. Everything is open-sourced on GitHub (Repo API, Repo Front End), and anyone can contribute. All the details and the tool itself can be found on the official website, which was mentioned in the first paragraph.

Source(s)

Reddit (translated)

Loading Comments

Comment on this article

⟨

Logitech's new gaming mouse mimics the best gaming keyboards and adds haptic feedback along with web-based G Hub — G Pro X2 Superstrike launch

Buckshot Roulette: An addictive and creepy Russian Roulette simulator now just $1.79 on Steam for a limited time

⟩

Please share our article, every link counts!

Add as a preferred
source on Google

Codrut Nistor - Senior Tech Writer - 6749 articles published on Notebookcheck since 2013

In my early school days, I hated writing and having to make up stories. A decade later, I started to enjoy it. Since then, I published a few offline articles and then I moved to the online space, where I contributed to major websites that are still present online as of 2021 such as Softpedia, Brothersoft, Download3000, but I also wrote for multiple blogs that have disappeared over the years. I've been riding with the Notebookcheck crew since 2013 and I am not planning to leave it anytime soon. In love with good mechanical keyboards, vinyl and tape sound, but also smartphones, streaming services, and digital art.

contact me via: @online_digi, online.digital.craft, LinkedIn

> Expert Reviews and News on Laptops, Smartphones and Tech Innovations > News > News Archive > Newsarchive 2025 09 > Open source tool measures the stupidity level of AI models

Codrut Nistor, 2025-09-17 (Update: 2025-09-17)

Source(s)

Related Articles