Elon Musk's xAI has launched the Grok 3 family of leading-edge AI large language models that generally outperform other AIs on standardized AI benchmarks.
The Grok 3 models were trained on the company's Colossus supercomputer cluster that uses 100,000 Nvidia Hopper Tensor Core GPUs. A pair of standard and mini non-reasoning models (Grok 3 beta and Grok 3 mini beta) along with a pair of reasoning models (Grok 3 beta (Think) and Grok 3 mini beta (Think)) have been released.
The non-reasoning models generally outperform the prior chart-topping AI, such as OpenAI GPT-4o and DeepSeek-V3. One reason is that they have a one million token context window, which allows the AI to use very large amounts of text. This improves the models' ability to synthesize the correct answer from a variety of sources. That said, the Grok 3 beta models still answer fact-seeking questions with less than 50% accuracy (SimpleQA benchmark), so humans will still have jobs tomorrow.
The reasoning models think through complex prompts step-by-step, allowing the user to see the AI's thought process. This allows these AI to work through problems like an expert would by solving smaller parts of the problem and combining results for a proper answer. Selecting the DeepSearch agent, or search option, will tell Grok 3 to search broadly and deeply across the internet and use code interpreters before generating reports that summarize its findings. The Grok 3 (Think) models generally rank the best at solving math problems, answering graduate-level multiple choice questions, and completing coding tasks versus other AI.
xAI expects to continue tuning Grok 3 for improved performance in upcoming months on a 200,000-GPU supercomputer cluster. Grok 3 is now available to all users on X and Grok.com. Free users may encounter usage limits, while paying users will have access to advanced features.