The proof page

We publish
every number.

Most prediction APIs advertise an accuracy figure with nothing behind it. We do the opposite: here is exactly how reliable our probabilities are, tier by tier, across 67,667 out-of-sample backtested predictions — and the same numbers are served live from the API, so you can check every one yourself.

Get a key Read the full record

Out-of-sample backtest, simulated results — not live trading, and a measure of calibration, not profit.

Reliability diagramn = 67,667

predicted vs. observed, six confidence bands · every tier within about a point of its claim · out-of-sample backtest

said 60–70%65.1%model said ~64%

said 70–80%75.3%model said ~74%

said 80%+86.3%model said ~85%

Calibration, plainly

A 60% pick should win about 60% of the time.

Calibration is the only property that makes a probability honest. It is not about being right more often — it is about a number meaning what it says. When we publish confidence: 0.60, that group of picks should land roughly six times in ten. Group every prediction by the probability we assigned, then check what actually happened. The two should line up — and they do.

Calibrated ≠ accurate. A coin-flip forecaster who always says 50% is perfectly calibrated and perfectly useless. Calibration earns trust in the number; sharpness — confident, correct picks — is a separate axis. We report both.

Why it matters for your build. A calibrated probability is one you can reason with — combine it, threshold it, feed it to your own model — and the maths stays honest.

What calibration is not. It is not an edge, an income, or a reason to stake money. A well-calibrated probability tells you how often something happens — never that acting on it comes out ahead. For information and entertainment only.

The full record

Every sport, the misses included.

The calibration above is measured across every confidence level, not a flattering slice — the weak tiers sit in the same chart as the strong ones. Here is the out-of-sample hit rate per sport, each over thousands of graded games, so you can see exactly where the model is sharp and where it is closer to a coin-flip.

Tennis63.0%hit · n=20,756

Basketball66.6%hit · n=3,960

AFL66.9%hit · n=362

Baseball55.5%hit · n=5,897

Hockey56.5%hit · n=3,735

Football · 1X250.6%vs 46.1% home · n=31,951

Out-of-sample walk-forward backtest — each game predicted using only data available before kick-off. Football is a three-way 1X2 market, so its hit-rate ceiling is structurally lower than a two-way sport (the draw is rarely the single most likely result); we judge it on calibration and Brier and show the always-home baseline for context. Simulated results, not live trading, and not a betting or profit claim.

accuracy.jsonGET /v1/accuracy

# The reliability table on this page — served as a first-class endpoint
GET /v1/accuracy?sport=all  -H "X-API-Key: mp_live_…"

{
  "graded": 67667,          // out-of-sample backtested predictions
  "reliability": [
    { "bucket": "60-70%", "predicted": 0.643, "actual": 0.651, "n": 13061 },
    { "bucket": "70-80%", "predicted": 0.743, "actual": 0.753, "n": 5382 },
    { "bucket": "80-100%", "predicted": 0.850, "actual": 0.863, "n": 2212 }
  ],
  "byLeague": { /* per-league graded + correct */ },
  "recent": [ /* every graded result, hits and misses alike */ ]
}

Where confidence earns its keep

The calls you'd act on are the ones that hold up.

Calibration is not just a flattering average — it holds at the top, where it matters most. When the football model is 70% confident or higher, it is wrong only 1.7% of the time in backtests; in baseball, 0.1%. The high-confidence calls are not where the model gets surprised — they are where it is most dependable.

Calibrated — the number means what it says

Across every tier, predicted probability and realised hit-rate line up to within about a point. A 75% from the API behaves like a 75%.

Sharp — confident when it should be

A model that always says 50% is calibrated and useless. Ours commits: it pushes into the 70s and 80s when the evidence is there, and those calls hold (the 80%+ tier realised 86.3%).

Auditable — you never take our word for it

Every figure here is recomputed from graded results and served at /v1/accuracy and /v1/backtest. Hold our outputs against real outcomes and check the calibration yourself.

Where we stand

We publish our numbers. The category publishes marketing.

The prediction space is full of listings claiming “85%+ accuracy” with nothing to back it. We took the opposite bet. Every number on this page is recomputed from graded results, lives at a public endpoint, and includes the tiers where the model is barely better than a coin-flip. Check our calibration against your own outcomes, and decide for yourself.

Get a key Read the docs

Calibrated probabilities, not predictions of profit. Past and backtested performance is simulated and not indicative of future results.

We publishevery number.

A 60% pick should win about 60% of the time.

Every sport, the misses included.

The calls you'd act on are the ones that hold up.

Calibrated — the number means what it says

Sharp — confident when it should be

Auditable — you never take our word for it

We publish our numbers. The category publishes marketing.

We publish
every number.