Claude model has 22% implied probability to score 45%+ on Humanity's Last Exam, ending June 30. $105 24h volume. Trade live on Polymarket via Polymarket Trade.
Connect wallet to trade · No wallet? Passkey login available · Free alerts at /subscribe
Humanity's Last Exam is a benchmark evaluation designed to assess advanced AI reasoning across multiple problem domains and disciplines. Anthropic's Claude models have shown strong performance on academic benchmarks in the past, but this particular evaluation is noted for its rigor and breadth. The 22% market probability reflects substantial trader skepticism about Claude reaching the 45% threshold by the June 30 deadline. This pricing suggests the market views the task as legitimately difficult—a meaningful challenge for frontier models, but not an impossible milestone. The relatively low trading volume of $105 in the past 24 hours is typical for AI capability predictions, where mainstream attention concentrates on major model releases and widely-publicized benchmarks rather than specialized evaluations. A Claude model achieving a 45% score would represent meaningful progress on challenging reasoning tasks, signaling that frontier AI systems are advancing toward broader autonomous capability. The current 22% odds imply roughly 1-in-5 market confidence in this outcome, consistent with how prediction markets price narrow, technically demanding milestones.
Humanity's Last Exam represents an ambitious effort to measure advanced AI reasoning across multiple problem domains and disciplines. Unlike specialized benchmarks that focus on specific knowledge areas (mathematics, language understanding, code generation), this evaluation blends diverse reasoning challenges into a single comprehensive assessment. The 45% threshold is a meaningful milestone—substantially above random or baseline performance, but below the performance levels of human experts on the full evaluation suite. Factors that could drive Claude toward YES include rapid iteration in Anthropic's research program, demonstrated improvements in recent model versions, and Claude's documented track record on complex reasoning tasks. Should Anthropic release a new model variant or major capability improvement before the June 30 deadline, reaching 45% becomes more plausible. Additionally, targeted fine-tuning or advanced prompting techniques designed specifically for the benchmark structure could unlock hidden performance gains beyond what baseline model capabilities suggest. Conversely, several factors support the current low odds. First, Humanity's Last Exam is intentionally designed to be difficult—benchmark creators typically include problems that challenge even frontier models. Second, the 45% threshold is genuinely demanding, requiring the model to solve nearly half of a curated set of hard problems. Third, Anthropic's research roadmap is not guaranteed to prioritize this particular benchmark or deliver major capability breakthroughs before June 30; the company may focus on other product lines or research directions. Fourth, frontier AI models often show diminishing returns on reasoning benchmarks—each percentage point of improvement becomes harder as performance approaches human-expert levels. Recent context reveals broader prediction market conservatism about AI capability milestones. As large language models approach human-level performance in specific domains, traders have priced increasingly marginal improvements more skeptically. The minimal trading volume ($105 in 24 hours) confirms this market sits at the niche end of AI prediction markets, attracting primarily AI research enthusiasts rather than the broader trading public. The 22% probability represents a market consensus that breakthrough performance is possible but genuinely unlikely. Traders are signaling: expect Claude to fall short of 45%, but remain open to upside surprise if Anthropic delivers a major release or capability breakthrough between now and June 30. This pricing is consistent with how prediction markets typically evaluate narrow, technically demanding milestones that hinge on specific company research timelines and release decisions.
Market resolves YES if any Anthropic Claude model achieves at least 45% on Humanity's Last Exam before June 30, 2026. Resolution requires credible public announcement of results by Anthropic or the benchmark creators.
Polymarket Trade is an independent third-party interface to the Polymarket CLOB prediction market exchange on Polygon — not affiliated with Polymarket, Inc. Prediction markets aggregate trader expectations into real-time probability estimates. Every market question resolves YES or NO based on a specific event outcome; traders buy shares of the side they believe will resolve positively. Prices range 0¢ (certain no) to 100¢ (certain yes) and naturally reflect the crowd-implied probability of YES. Polymarket Trade is non-custodial — your funds never leave your wallet. Open the full interactive page linked above to place orders, see order book depth, and execute a trade.