The 66th International Mathematical Olympiad (IMO) in Australia was pretty big for artificial intelligence. In a first for the competition, Google DeepMind’s Gemini "Deep Think" model achieved a gold-medal level performance, solving five out of six exceptionally difficult Olympiad problems and scoring 35 of 42 points, which matches the gold threshold set by the IMO grading committee.
This is pretty a big improvement over last year, when DeepMind’s AlphaProof and AlphaGeometry 2 only reached the silver-medal standard by solving four problems and scoring 28 points. The former systems had required translation between natural language and formal proof languages, and solutions took several days to compute.
This year, Gemini operated end-to-end using only natural language, producing full, rigorous solutions directly from the official IMO descriptions within the 4.5-hour contest window. The official IMO graders - who also assess human contestants - described Gemini’s answers as "clear" and "precise." DeepMind has credited this progress to advanced reinforcement learning techniques, a curated database of mathematical solutions, and new parallel reasoning methods that allow exploring multiple solution routes simultaneously. While other AI systems, including those from OpenAI, reportedly achieved similar unofficial results, Gemini’s assessment was formally certified by Olympiad coordinators, which is a world-first for autonomous mathematics systems.
DeepMind is planning to make the Deep Think model available to select mathematicians before broader rollout, while continuing to push both natural-language and formal reasoning agents for mathematics research.