Will AI Reach 90% on FrontierMath by 2027? | Odds & Market

YES16¢

NO84¢

24h volume: $72 · Liquidity: $6.1K · Ends 2026-12-31

The FrontierMath Benchmark tests AI systems on challenging mathematical problems designed to probe the limits of language model reasoning. With current YES odds at just 16%, market participants express significant skepticism about any AI model achieving 90% accuracy before 2026 year-end—a threshold that would represent a major breakthrough in mathematical AI capability. The benchmark includes problems spanning discrete mathematics, geometry, and abstract algebra, making it one of the more rigorous tests of AI reasoning prowess. At 16% implied probability, traders are pricing in both the technical difficulty of closing the gap to 90% and the compressed timeline. Current leading models typically score in the 30–50% range on FrontierMath, leaving substantial room for improvement. The low probability baked into odds reflects skepticism that any model will make a 40+ percentage-point jump within the next nine months, though rapid AI progress could shift that calculation.

Deep dive — what moves this market

The FrontierMath Benchmark was designed by a consortium of mathematicians and AI researchers to evaluate language model capability on genuinely difficult, competition-style mathematics problems. Unlike simpler arithmetic or geometry tasks, FrontierMath includes problems from mathematical olympiads, advanced undergraduate coursework, and research-adjacent domains. The benchmark emerged in response to concerns that existing benchmarks had become saturated—models like GPT-4 and Claude already exceed 90% on many standard math tests, creating ambiguity about true mathematical reasoning versus memorized patterns. Current state-of-the-art performance on FrontierMath hovers between 30% and 50%, depending on the model and whether chain-of-thought prompting is applied. GPT-4 and recent Sonnet variants represent the frontier. This creates a 40+ percentage-point gap between current best-in-class and the 90% threshold, a distance that traders clearly view as immense within a nine-month window. The 16% odds reflect several embedded skepticisms. First, mathematical reasoning appears to be one of the slowest-improving AI frontier skills—gains have been incremental rather than step-function. Second, FrontierMath problems are adversarially designed to resist simple pattern-matching. Third, the timeline is tight. Major capability jumps typically require new model architectures or training techniques, not just fine-tuning. Reaching 90% would likely require either a fundamentally new approach to mathematical reasoning or a breakthrough in test-time scaling (allowing models to spend more compute per problem). However, paths to YES exist. If multimodal reasoning, reinforcement learning on mathematical reasoning, or vastly increased compute-per-token yields breakthroughs, performance could accelerate. Some speculate that reasoning models trained specifically on mathematical domains could achieve higher scores than general-purpose models. Conversely, NO is favored because: (a) 90% is a very high bar—it means getting only 1-in-10 problems wrong, leaving almost no room for conceptual gaps; (b) FrontierMath is explicitly designed to avoid saturation, meaning it updates as models improve; (c) the timeline compresses the probability; and (d) historical precedent suggests such jumps take years, not months. The 16% odds imply traders view it as a clear underdog outcome but not negligible. It prices in tail-risk upside if a lab announces a major breakthrough in mathematical reasoning.

What traders watch for

New model announcements from OpenAI, Anthropic, or other major labs claiming improved mathematical reasoning or FrontierMath performance gains.
Public updates or leaderboard postings showing any model approaching or claiming 85%+ accuracy on FrontierMath Benchmark.
Research breakthroughs in test-time scaling, reinforcement learning for math, or specialized reasoning architectures targeting mathematical problem-solving.
Year-end 2026 deadline: markets resolve on official FrontierMath results or public claims of 90%+ performance by December 31, 2026.

How does this market resolve?

Markets resolve YES if any AI model publicly demonstrates 90% or higher accuracy on the FrontierMath Benchmark by December 31, 2026, as verified by official benchmark creators or major AI labs. Resolution depends on officially released results or peer-reviewed publications confirming the achievement before year-end 2026.

Ai category — at a glance

Active markets: 148
Avg YES price: 21¢

About prediction markets

Prediction markets aggregate trader expectations into real-time probability estimates. On Polymarket Trade, every market question resolves YES or NO based on a specific event outcome; traders buy shares of the side they believe will resolve positively. Prices range 0¢ (certain no) to 100¢ (certain yes) and naturally reflect the crowd-implied probability of YES. This page summarizes the market state for readers arriving from search; for live trading (place orders, see order book depth, execute a trade) open the full interactive page linked above.

Open full market page →