OpenAI’s Experimental Model Achieves Gold Medal at Math Olympiad
OpenAI’s experimental AI model achieved gold medal performance at the 2025 International Math Olympiad, solving 5/6 problems and marking the first time AI reached this elite level.
OpenAI's latest experimental reasoning model has achieved a historic milestone by earning gold medal-level performance at the 2025 International Math Olympiad (IMO), marking the first time an AI system has reached this elite level in one of the world's most prestigious mathematical competitions.
The Achievement in Detail
Performance Metrics
The experimental model successfully solved 5 out of 6 problems from the 2025 IMO, earning 35 out of 42 points—precisely enough to qualify for a gold medal. This score places the AI within the top 10% of approximately 630 global student competitors who participated in the 2025 competition. Among human contestants, only 67 students earned gold medals in 2025, representing about 10% of all participants.
Authentic Competition Conditions
The model was evaluated under the same rigorous conditions as human contestants:
- Two 4.5-hour exam sessions over two days
- No access to internet or external tools
- Required to read official problem statements
- Must produce detailed, natural language mathematical proofs
- Three former IMO medalists independently graded each solution with unanimous consensus required
Technical Significance
General-Purpose Reasoning Breakthrough
Unlike specialized mathematical AI systems such as Google DeepMind's AlphaGeometry, OpenAI's model is a general-purpose large language model designed for broad reasoning capabilities. Alexander Wei, OpenAI technical staff member, emphasized that the achievement came "not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling".
Advanced Mathematical Reasoning
The model demonstrated the ability to craft intricate, watertight arguments at the level of human mathematicians. According to researchers, it showed:
- Sustained reasoning capabilities over extended periods
- Ability to construct full mathematical arguments step-by-step
- Creative problem-solving rather than pattern matching
- Endurance and logic that surpassed previous AI benchmarks
The International Math Olympiad Context
Competition Prestige
The IMO, first held in 1959 in Romania, is widely regarded as the pinnacle of competitive mathematics for high school students worldwide. The competition tests participants across challenging areas including:
- Advanced algebra and pre-calculus
- Combinatorics
- Geometry
- Abstract reasoning and creative problem-solving
Historical Significance
Notable past IMO winners include renowned mathematicians Grigori Perelman and Terence Tao, both celebrated for advancing the frontiers of mathematics. The competition's difficulty level requires not just computational ability but deep mathematical creativity and insight.
Competitive Landscape
Outperforming Previous AI Systems
OpenAI's achievement is particularly significant because prior AI systems and leading large language models from Google and xAI did not reach even the bronze threshold in the same evaluation. This represents the first time an AI has performed at medal-winning human level in this competition.
Progression of AI Mathematical Capabilities
Wei illustrated the scale of advancement by noting the progression through increasingly difficult mathematical benchmarks:
- GSM8K (~0.1 minute for top humans)
- MATH benchmark (~1 minute)
- AIME (~10 minutes)
- IMO (~100 minutes)
Connection to GPT-5 Development
Implications for Future Models
While this experimental model won't be released publicly soon, OpenAI has confirmed that GPT-5 is launching shortly. The mathematical reasoning breakthrough suggests significant capabilities that may be incorporated into consumer-facing products.
Enhanced Reasoning Architecture
The success indicates that OpenAI's approach to test-time compute scaling and reinforcement learning is yielding substantial improvements in AI reasoning capabilities. This methodology could transform how AI systems approach complex, multi-step problem-solving tasks.
Critical Perspectives
Transparency Concerns
Some researchers have raised questions about the methodology. NYU professor Gary Marcus noted: "OpenAI has told us the result, but not how it was achieved. That leaves me with many questions", highlighting the need for more detailed technical disclosure.
Broader Implications
The achievement has sparked discussions about AI potentially overshadowing human intellectual achievements and raising ethical questions about AI's growing dominance in traditionally human domains.
Looking Forward
This milestone represents what OpenAI researchers call "a new level of sustained creative thinking" in artificial intelligence. The breakthrough suggests we're approaching a significant inflection point in AI capabilities, particularly in abstract reasoning and mathematical problem-solving that could have profound implications across scientific research, education, and technological development.
The experimental model's success at the IMO demonstrates that AI systems are increasingly capable of handling tasks that require not just computational power, but genuine mathematical insight, creativity, and the ability to construct rigorous logical arguments—capabilities once considered uniquely human.