GPT has 29% odds to score 50%+ on Humanity's Last Exam by June 30, with $136 24h volume. Trade live on Polymarket via Polymarket Trade.
Connect wallet to trade · No wallet? Passkey login available · Free alerts at /subscribe
'Humanity's Last Exam' is a benchmark designed to evaluate OpenAI's GPT models across reasoning, problem-solving, comprehension, and cognitive tasks, with formal assessment expected by June 30, 2026. The 29% market probability indicates traders are quite skeptical that any current GPT iteration will achieve at least a 50% score on this evaluation. This pricing reflects market assessment that current GPT versions face meaningful limitations in abstract reasoning, complex multi-step inference, nuance recognition, or other dimensions this benchmark specifically targets. The relatively low market liquidity ($1,730 total, $136 24h volume) is typical for technical AI capability assessments, which attract primarily researchers and AI specialists rather than mainstream market participants. Historically, advances in AI performance on comprehensive reasoning benchmarks have come gradually rather than through sudden breakthroughs, and market pricing appears to reflect this measured outlook. The June 30 deadline provides a defined resolution window for what many see as a genuine test of GPT's current reasoning and comprehension capabilities.
Humanity's Last Exam appears to be a rigorous evaluation framework targeting limitations that persist even in frontier large language models. While OpenAI has made rapid advances in model capability, each new generation typically hits performance ceilings on benchmarks designed to test genuine reasoning rather than pattern matching. The 29% odds likely reflect trader conviction that GPT will fall short of 50% at least in part because reasoning benchmarks have historically been where language models plateau hardest. Multi-step inference, handling novel problem types, and maintaining accuracy across long chains of reasoning are areas where even state-of-the-art models show measurable gaps. Several factors could push this market toward YES. A new model release between now and June 30 with substantially improved reasoning architecture could shift expectations sharply. OpenAI's ongoing scaling efforts and training improvements might narrow the gap. If the exam has specific domains where GPT already shows strength—certain math problems, coding tasks, or domain-specific reasoning—a 50% threshold might be achievable. Conversely, the market's skepticism is justified by several NO factors. Comprehensive reasoning benchmarks are notoriously difficult to crack; many require handling ambiguity, abstract thinking, and novel problem formulations that test true understanding rather than memorization. If the exam weights these dimensions heavily, 50% becomes a high bar. The benchmark creators may have specifically designed the test to identify GPT's weaknesses, not its strengths. Additionally, scoring methodology matters—if partial credit is limited, exact answers are required, or if ambiguous questions are marked harshly, the passing rate could easily stay below 50%. Historically, AI benchmarks show a pattern: models typically plateau at 60-80% on established benchmarks rather than reaching 95%+, and new benchmarks designed to be genuinely challenging often see initial performance in the 30-50% range. The current market price of 29% for YES may be pricing in both the real difficulty of the task and market uncertainty about what the evaluation criteria actually are. With only $136 in 24-hour volume, this is a thin market, meaning the price may not fully reflect all available information. Institutional traders or researchers with direct knowledge of the exam's difficulty would have strong incentives to trade here, and the thin volume suggests either they lack conviction or they're watching for clearer signals. The June 30 deadline is approximately six months away, giving OpenAI time for one or potentially two model releases, but major architectural improvements within that window are typically not guaranteed. The market's 29% probability is essentially saying: 'More likely than not, GPT will still fall short of 50% on this comprehensive reasoning benchmark by mid-2026,' which aligns with historical patterns and known limitations in current-generation models.
Market resolves YES if OpenAI's GPT achieves at least 50% on Humanity's Last Exam by the June 30, 2026 evaluation date. Resolution depends on official results announcement by the benchmark creators.
Polymarket Trade is an independent third-party interface to the Polymarket CLOB prediction market exchange on Polygon — not affiliated with Polymarket, Inc. Prediction markets aggregate trader expectations into real-time probability estimates. Every market question resolves YES or NO based on a specific event outcome; traders buy shares of the side they believe will resolve positively. Prices range 0¢ (certain no) to 100¢ (certain yes) and naturally reflect the crowd-implied probability of YES. Polymarket Trade is non-custodial — your funds never leave your wallet. Open the full interactive page linked above to place orders, see order book depth, and execute a trade.