OpenAI’s Experimental Model Achieves Gold Medal at Math Olympiad

OpenAI’s experimental AI model achieved gold medal performance at the 2025 International Math Olympiad, solving 5/6 problems and marking the first time AI reached this elite level.

OpenAI’s Experimental Model Achieves Gold Medal at Math Olympiad
Photo by Andrew Neel / Unsplash

OpenAI's latest experimental reasoning model has achieved a historic milestone by earning gold medal-level performance at the 2025 International Math Olympiad (IMO), marking the first time an AI system has reached this elite level in one of the world's most prestigious mathematical competitions.

The Achievement in Detail

Performance Metrics

The experimental model successfully solved 5 out of 6 problems from the 2025 IMO, earning 35 out of 42 points—precisely enough to qualify for a gold medal. This score places the AI within the top 10% of approximately 630 global student competitors who participated in the 2025 competition. Among human contestants, only 67 students earned gold medals in 2025, representing about 10% of all participants.

Authentic Competition Conditions

The model was evaluated under the same rigorous conditions as human contestants:

  • Two 4.5-hour exam sessions over two days
  • No access to internet or external tools
  • Required to read official problem statements
  • Must produce detailed, natural language mathematical proofs
  • Three former IMO medalists independently graded each solution with unanimous consensus required

Technical Significance

General-Purpose Reasoning Breakthrough

Unlike specialized mathematical AI systems such as Google DeepMind's AlphaGeometry, OpenAI's model is a general-purpose large language model designed for broad reasoning capabilities. Alexander Wei, OpenAI technical staff member, emphasized that the achievement came "not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling".

Advanced Mathematical Reasoning

The model demonstrated the ability to craft intricate, watertight arguments at the level of human mathematicians. According to researchers, it showed:

  • Sustained reasoning capabilities over extended periods
  • Ability to construct full mathematical arguments step-by-step
  • Creative problem-solving rather than pattern matching
  • Endurance and logic that surpassed previous AI benchmarks

The International Math Olympiad Context

Competition Prestige

The IMO, first held in 1959 in Romania, is widely regarded as the pinnacle of competitive mathematics for high school students worldwide. The competition tests participants across challenging areas including:

  • Advanced algebra and pre-calculus
  • Combinatorics
  • Geometry
  • Abstract reasoning and creative problem-solving

Historical Significance

Notable past IMO winners include renowned mathematicians Grigori Perelman and Terence Tao, both celebrated for advancing the frontiers of mathematics. The competition's difficulty level requires not just computational ability but deep mathematical creativity and insight.

Competitive Landscape

Outperforming Previous AI Systems

OpenAI's achievement is particularly significant because prior AI systems and leading large language models from Google and xAI did not reach even the bronze threshold in the same evaluation. This represents the first time an AI has performed at medal-winning human level in this competition.

Progression of AI Mathematical Capabilities

Wei illustrated the scale of advancement by noting the progression through increasingly difficult mathematical benchmarks:

  • GSM8K (~0.1 minute for top humans)
  • MATH benchmark (~1 minute)
  • AIME (~10 minutes)
  • IMO (~100 minutes)

Connection to GPT-5 Development

Implications for Future Models

While this experimental model won't be released publicly soon, OpenAI has confirmed that GPT-5 is launching shortly. The mathematical reasoning breakthrough suggests significant capabilities that may be incorporated into consumer-facing products.

Enhanced Reasoning Architecture

The success indicates that OpenAI's approach to test-time compute scaling and reinforcement learning is yielding substantial improvements in AI reasoning capabilities. This methodology could transform how AI systems approach complex, multi-step problem-solving tasks.

Critical Perspectives

Transparency Concerns

Some researchers have raised questions about the methodology. NYU professor Gary Marcus noted: "OpenAI has told us the result, but not how it was achieved. That leaves me with many questions", highlighting the need for more detailed technical disclosure.

Broader Implications

The achievement has sparked discussions about AI potentially overshadowing human intellectual achievements and raising ethical questions about AI's growing dominance in traditionally human domains.

Looking Forward

This milestone represents what OpenAI researchers call "a new level of sustained creative thinking" in artificial intelligence. The breakthrough suggests we're approaching a significant inflection point in AI capabilities, particularly in abstract reasoning and mathematical problem-solving that could have profound implications across scientific research, education, and technological development.

The experimental model's success at the IMO demonstrates that AI systems are increasingly capable of handling tasks that require not just computational power, but genuine mathematical insight, creativity, and the ability to construct rigorous logical arguments—capabilities once considered uniquely human.