A bit over two years after its release, xAI's Grok has become the leading AI language model, surpassing OpenAI's ChatGPT, Google's Gemini, or DeepSeek, as well as Meta and Anthropic. Grok will be arriving to Tesla cars next week, said Elon Musk.
According to independent third-party testing, the newly released Grok 4 has now topped the public AI models performance chart. The driving force behind the 10x improvement in reasoning between Grok 3 and Grok 4 were the AI compute clusters that xAI built with breakneck speed, doubling them to 200,000 GPUs on the way to the planned million.
The xAI team contacted the folks behind the demanding ARC-AGI performance test and asked them to run their suites of AI tests, with surprising results:
First, the facts: Grok 4 is now the top-performing publicly available model on ARC-AGI. This even outperforms purpose-built solutions submitted on Kaggle. Second, ARC-AGI-2 is hard for current AI models. To score well, models have to learn a mini-skill from a series of training examples, then demonstrate that skill at test time. The previous top score was ~8% (by Opus 4). Below 10% is noisy. Getting 15.9% breaks through that noise barrier, Grok 4 is showing non-zero levels of fluid intelligence
Another independent AI tester, Artificial Analysis, said that they have "run our full suite of benchmarks and Grok 4 achieves an Artificial Analysis Intelligence Index of 73, ahead of OpenAI o3 at 70, Google Gemini 2.5 Pro at 70, Anthropic Claude 4 Opus at 64 and DeepSeek R1 0528 at 68."
According to Elon Musk in the Grok 4 release presentation, the xAI's model is now smarter than all graduate students in all disciplines combined. With his typical pie-in-the-sky bluster, Tesla's CEO claimed that Grok 4 will be able to discover "new technologies" like medicines or engineering breakthroughs on its own next year.
Still, he admitted that Grok will still be bad at image recognition for the next month or so, and addressed the recent supremacist answers controversy by saying that "when Grok goes far wrong, that is usually due to something foolish we did, like a bad system prompt, or placing too much weight on biased sources."
Musk needs to pump Grok 4 as his xAI is introducing a paid premium tier for the first time. Called SuperGrok Heavy, it starts at $300/month, and includes what's in the $30/month SuperGrok tier that gives initial access to Grok 4, plus access to the Grok 4 Heavy platform that offers higher rate limits and early access to new features.
Grok 3 will remain free to use for the general public, while every X Premium+ subscriber will include access to Grok 4 in the SuperGrok tier.