Notebookcheck Logo

Open source tool measures the stupidity level of AI models

Vibe coders have to face inconsistent performance of AI models (Image source: Generated using OpenAI)
AI models are not stable (Image source: Generated using OpenAI)
A new open-source tool is offering real-time monitoring of multiple AI models, including OpenAI GPT-5, Claude Opus 4, and Gemini 2.5 Pro. The first of its kind, it can detect "when AI companies reduce model capability to save costs." The benchmarks can run against the users' own OpenAI, xAI, Anthropic, or Google API keys as well.

Those who have worked with AI models for various tasks, especially coding, have noticed that the software tools behave inconsistently. In some cases, they simply fail to provide any answers; sometimes they deliver erroneous code, and when they come up with what was expected, they do it slower than usual. This is where the AI Benchmark Tool, located at AistupidLevel.info, steps in, providing real-time information regarding the performance and accuracy of several AI models, including cost data.

The aforementioned open-source tool runs over 140 coding, debugging, and optimization tasks on all large models. For now, it tracks the following: OpenAI GPT, Claude, and Gemini. Grok will be added soon as well. Its highlights include the following:

  • Real-time price information, since some models that seem cheap need 10 iterations to get a job done, while others that seem more expensive at first sight will accomplish the same task in 2 iterations, so for a lower effective cost.
  • The ability to run the same tests with your own API keys.
  • Real-time AI performance monitoring, including live model rankings based on stupidity and smartness.
  • Smart recommendations, based on combined performance.
  • Notification of active degradations—for example, Gemini-2.5-Flash is now 44% down compared to the baseline value.

Currently, the smart recommendations are these: Gemini-2.5-Flash-Lite for code, Claude-3.5-Sonnet-20241022 for reliability, and Gemini-2.5-Flash-Lite for speed. Everything is open-sourced on GitHub (Repo API, Repo Front End), and anyone can contribute. All the details and the tool itself can be found on the official website, which was mentioned in the first paragraph.

Source(s)

Reddit (translated)

static version load dynamic
Loading Comments
Comment on this article
Please share our article, every link counts!
Mail Logo
> Expert Reviews and News on Laptops, Smartphones and Tech Innovations > News > News Archive > Newsarchive 2025 09 > Open source tool measures the stupidity level of AI models
Codrut Nistor, 2025-09-17 (Update: 2025-09-17)