Methodology
How we measure
our own model.
Most prediction APIs ask you to trust an accuracy figure. We'd rather show our working. This is exactly how MatchPrior grades itself — the same way statisticians grade weather forecasts and election models — and why every number is checked out-of-sample, across 67,667 graded predictions.
predicted vs. observed, six confidence bands · out-of-sample backtest
Calibration: a number meaning what it says.
Calibration is the property that makes a probability honest. Take every prediction where we said roughly 70%, and check how often that outcome actually happened. If the model is calibrated, it happens about 70% of the time — not 80%, not 60%. It is the same standard used to grade a weather forecaster: of the days they call a 70% chance of rain, it should rain on roughly seven in ten.
Brier score and calibration error.
Out-of-sample, walk-forward — never marking our own homework.
Predict using only the past
For every historical match we generate the prediction using only data available before kick-off — ratings, form and calibration fit on earlier games. The model never sees the result it is about to be graded on.
Grade against what actually happened
We then compare each prediction to the real outcome and accumulate Brier, calibration error, hit rate and the reliability diagram across 67,667 graded calls spanning multiple sports and seasons.
Serve it from the API, unedited
The same figures are returned live by /v1/accuracy and /v1/backtest — including the tiers where the model is barely better than a coin-flip. Nothing is hand-picked.
Don't take our word for it.
Every number on this page is recomputed from graded results and served at a public endpoint. Pull our predictions, hold them against real outcomes, and check the calibration — or test your own forecasts with our free reliability tool.