中央研究院數學研究所

Lakeside Lectures

Achieving IMO Gold: Robust Mathematical Reasoning with LLMs

2025/11/03 (Mon.) 14:00~15:00

Date : 2025/11/03 (Mon.) 14:00~15:00
Location : Room 202, 2F, Astro-Math. Building
Speaker : Dawsen Hwang (Google DeepMind)
Organizer : Olivier Hénot(NTU), Chun-Ju Lai(AS), Chao Ming Lin(NTU), Colin McSwiggen(AS), Masao Oi(NTU)
Abstract : In 2025, an advanced version of Gemini with Deep Think achieved a historic gold-medal standard at the International Mathematical Olympiad (IMO), the world’s most prestigious competition for young mathematicians. This breakthrough represents a significant leap in AI reasoning, as our model solved five of the six problems end-to-end in natural language—producing rigorous, human-readable proofs directly from the problem statements, all within the official competition time limit.
This talk will provide high-level ideas and design principles about the LLM and how it achieves robust mathematical reasoning. A critical challenge in advancing mathematical AI is that existing evaluation benchmarks are often too simplistic or focus only on correct short answers, failing to assess the deep, multi-step reasoning required for Olympiad-level mathematics. We will present how we overcame this issue by establishing a new north-star for our research: a suite of rigorous evaluation benchmarks designed to reward the entire problem-solving process. This talk will detail the design of these benchmarks, and explore how they, in turn, guided the high-level research and design principles that were fundamental to achieving gold-level performance.