OpenAI LLM Achieves Breakthrough at 2025 International Math Olympiad

News Synopsis
In a breakthrough for artificial intelligence, an experimental large language model (LLM) developed by OpenAI has demonstrated gold medal-level performance in the 2025 International Math Olympiad (IMO) — widely regarded as the world’s toughest high school math competition.
OpenAI researcher Alexander Wei revealed on X that the model successfully tackled five out of six challenging problems from the International Math Olympiad (IMO), achieving a score of 35 out of 42. This performance meets the threshold required to earn a gold medal in the official competition.
“We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs,” Wei explained.
IMO’s Global Prestige and Difficulty
The 2025 International Math Olympiad (IMO) is known for its extremely challenging problems that test creative, abstract, and logical mathematical reasoning over extended durations. These aren’t rote arithmetic puzzles—they demand a deep understanding of concepts, often requiring pages of proofs and days of effort even by top students.
Wei pointed out the model’s evolution in context:
“We’ve now progressed from GSM8K (~0.1 min for top humans) MATH benchmark (~1 min) AIME (~10 mins) IMO (~100 mins),” he said.
This progression reflects quantum leaps in AI capabilities, moving from solving simpler grade-school math problems to tackling elite-level Olympiad challenges.
Testing, Validation, and Human-Level Reasoning
Solutions Verified by Former IMO Medalists
To validate the performance, OpenAI had the model's submissions independently graded by three former IMO gold medallists, who all unanimously approved the results.
“The model solved P1 through P5; it did not produce a solution for P6,” said Wei, adding that the model’s answers displayed a “distinct style,” influenced by its experimental training design.
Complex Proofs Beyond Reinforcement Learning
Wei also emphasized a key advancement in how the model reasons:
“By going beyond the reinforcement learning paradigm of clear-cut, verifiable rewards we’ve obtained a model that can craft intricate, watertight arguments at the level of human mathematicians.”
This suggests the model is not simply mimicking formulas—it is generating logically sound mathematical proofs in natural language, a rare and difficult feat even for humans.
No Public Release Yet; GPT-5 on the Way
IMO-Capable Model Won’t Be Released Soon
Despite the buzz, OpenAI has made it clear that this IMO-level model is not meant for immediate public release. Wei clarified:
“We don’t plan to release a model with IMO gold level of capability for many months.”
The math-focused model is part of a distinct research track and not tied directly to upcoming commercial releases.
GPT-5 Coming Soon, Says Sam Altman
OpenAI CEO Sam Altman confirmed in a follow-up post that GPT-5 is nearing launch, but cautioned users against expecting IMO-grade reasoning in the standard version.
“We are releasing GPT-5 soon but want to set accurate expectations: this is an experimental model that incorporates new research techniques we don’t plan to release a model with IMO gold level of capability for many months,” Altman added.
He called the achievement:
“A significant marker of how far AI has come over the past decade,” adding that the model is not a math specialist, but a general-purpose reasoning system.
AI Progress Surpasses Forecasts
A Prediction Surpassed
Wei concluded with a personal reflection on how rapidly AI has advanced. In 2021, his PhD advisor Jacob Steinhardt had asked him to predict AI progress in math by July 2025.
“I predicted 30 per cent on the MATH benchmark. Instead, we have IMO gold.
Wei credited team members including Sheryl Hsu and Noam Brown, and celebrated the 2025 IMO participants — many of whom, he noted, are former OpenAI researchers or medallists themselves.
Summary
OpenAI’s breakthrough model marks a defining moment in AI reasoning, elevating the bar for what large language models can achieve in abstract, creative problem-solving. With GPT-5 on the horizon and even more capable systems in research, the AI math race is accelerating faster than ever imagined.
You May Like