Log loss is an alternative to Brier score that penalizes confident-but-wrong predictions far more harshly. It rewards accurate probability calibration while heavily punishing overconfident incorrect forecasts.
Log loss is an alternative to Brier score that penalizes confident-but-wrong predictions far more harshly. It rewards accurate probability calibration while heavily punishing overconfident incorrect forecasts.
Log loss, also called cross-entropy loss, is a mathematical measure of how well a probability prediction aligns with reality. When you make a forecast like "there's a 75% chance this event will happen," log loss quantifies how wrong you were based on whether the event actually occurred. The key insight is that log loss doesn't just care if you got the direction right—it cares deeply about how confident you were. If you say 90% and you're wrong, you get punished far more than if you said 51% and were wrong. Mathematically, log loss applies the logarithm to your predicted probability, making tiny errors in confidence levels produce massive penalty increases.
Log loss emerged from information theory and became standard in machine learning for evaluating probabilistic classifiers. In prediction markets, where probability estimates are the entire product, log loss became the natural way to score forecasters because it directly measures the quality of calibration—how often someone who says 70% actually turns out correct about 70% of the time. This distinction matters enormously: a forecaster can be technically "right" (predicting YES and YES happening, or predicting NO and NO happening) while being poorly calibrated (always saying 99% when it's really 60-40). Log loss catches this miscalibration in a way that simple accuracy cannot. For prediction market platforms and serious forecasters, log loss is the preferred metric because it drives the right incentive: be as accurate as possible in your probability estimates, not just in the direction of the outcome.
On Polymarket, traders may not see the term "log loss" explicitly in the UI, but the concept shapes how rankings and leaderboards work when they're scoring forecaster performance. When analyzing your own track record or comparing yourself to other traders, understanding log loss helps you see whether your profits came from luck (picking winners randomly) or genuine forecasting skill (assigning good probabilities). For example, if you placed trades that went from 70 cents to 90 cents and you profited, log loss would evaluate how well that 70-cent probability estimate was justified by the underlying fundamentals. A skilled trader should achieve good log loss scores because they're making predictions with well-reasoned probability distributions, not just getting lucky guesses right. Some advanced platforms surface log loss or related metrics to help traders benchmark their calibration against the market consensus.
A frequent misconception is that log loss and accuracy measure the same thing. They don't. You can have terrible log loss while still getting most of your predictions right (by always hedging and predicting 51-49 on everything), and you can have good log loss while being wrong more often (if you're well-calibrated on the low-probability events that do happen). Another pitfall is treating log loss as an absolute measure—it's always relative to some baseline. A log loss of 0.5 means nothing in isolation; what matters is whether it's better than a naive forecast or a competitor's score. Traders sometimes also mistakenly apply log loss to short-term price movements instead of final outcomes. Log loss is only meaningful when evaluated against the true outcome of the underlying event, not intermediate price fluctuations. Finally, some assume log loss is the only metric that matters; while it's excellent for probability calibration, it doesn't capture other valuable qualities like speed (being right early) or market impact.
Log loss is part of a family of scoring rules that evaluate probabilistic forecasts. Brier score, mentioned earlier, is similar but more forgiving of confident mistakes. Mean absolute error and root mean square error measure calibration in different ways. The concept of proper scoring rules—metrics where the best strategy is always to forecast your true belief—underpins log loss and all these alternatives. Understanding calibration, the tendency of a 70%-confidence forecast to be right exactly 70% of the time, is essential to interpreting log loss. In the context of prediction markets specifically, log loss connects to market efficiency: if the market is pricing an event at 30%, that's the "log loss of the market itself," and your trades succeed to the degree that you can assign better probabilities than the market's implicit prices.
Suppose a Polymarket question asks 'Will Ethereum exceed $3,500 by December 31, 2026?' and you buy YES at 65 cents (implying you believe there's a 65% probability). If Ethereum ends below $3,500, log loss severely penalizes your confidence. Conversely, if you had bought at 52 cents instead, the penalty would be less severe even though you're still wrong, because your prediction was less confident.