The .930 Mirage

1,869 goalie-seasons, 18 years, and the oldest statistical insight in science applied to the position that decides playoff series.

In 1886, Sir Francis Galton measured the heights of parents and their adult children. He discovered something that surprised him: tall parents tended to have children who were tall, but not as tall as they were. Short parents tended to have children who were short, but not as short. Everything drifted toward the average.

He called it "regression toward mediocrity." We now call it regression to the mean. It is not a force. It is not a cause. It is a statistical property of any measurement that contains noise. When an observation is extreme, part of that extremity is real and part is luck. The luck part does not persist. The observation moves back toward the center.

This has direct implications for how we evaluate NHL goaltenders, and for the decisions teams make heading into the playoffs.

The Measurement Problem

We pulled every goalie season from Hockey-Reference from 2007-08 through 2024-25: 1,869 goalie-seasons across 343 goalies and 18 complete seasons. Then we asked the simplest version of Galton's question: if a goalie posts a certain save percentage this year, what should you expect next year?

The answer is sobering. The year-to-year correlation for save percentage among goalies with 30 or more games played is r = 0.33.

In psychometric terms, this is a test-retest reliability coefficient, and 0.33 is low. It means that only about 11% of the variance in a goalie's save percentage reflects stable, repeatable ability. The remaining 89% is some combination of shot quality faced, defensive system effects, schedule, injuries, and pure statistical noise.

For comparison, skater points per game has a year-to-year reliability of roughly 0.60-0.70. Save percentage is about half as reliable as the most common skater statistic.

This correlation is robust. It holds at r = 0.30-0.33 whether you set the minimum games played threshold at 10, 30, or 50. The signal does not get stronger with more playing time, which suggests the noise is not primarily small-sample binomial variance. It is real game-to-game variation in the conditions goalies face.

One important caveat: this estimate comes from goalies who played in consecutive seasons. Goalies who were too bad to keep their jobs are missing from year two. This survivorship bias means the true population reliability is likely even lower than 0.33.

───────────

The Galton Chart

Galton's original insight is visible when you group goalie-seasons into deciles by save percentage and then look at what each group posted the following year.

The numbers tell the story clearly. Goalies in the bottom decile averaged .893 in year one and rose to .902 the next year. Goalies in the top decile averaged .929 and dropped to .917. The middle deciles barely moved at all: a goalie who posted .908 one year posted .908 the next. The extremes converge toward the center. This is not a collapse or a comeback. It is regression to the mean, operating exactly as Galton described it 140 years ago.

The magnitude of the regression is consistent with what the reliability coefficient predicts. If r = 0.33, a goalie who is 1.5 standard deviations above the mean should be expected to be about 0.50 standard deviations above the mean the following year. The observed data matches this prediction closely. There is no evidence of excess regression beyond what statistical theory expects.

This has concrete consequences. When Linus Ullmark posted a .938 save percentage in 2022-23 and won the Vezina Trophy, regression predicted his next season would be closer to .915. He posted .915 exactly. When Filip Gustavsson posted .931 the same year, regression predicted something near .912. He posted .899, which is below the regression prediction, but the direction was inevitable.

What This Means for Playoffs

The regular-season-to-playoff correlation is where things get more nuanced.

At face value, the correlation between regular season and playoff save percentage appears weak: r = 0.20 for goalies with 30 or more regular season games and 4 or more playoff games. That would suggest regular season performance explains only 4% of playoff goaltending.

But this number overstates the unpredictability. Playoff samples are small. A goalie who plays 8 playoff games faces roughly 240 shots. Binomial sampling noise alone, given a true save percentage of .910 and 240 shots, produces a standard deviation of .018 in the observed save percentage. That is nearly as large as the true talent spread among NHL goalies.

When you increase the minimum playoff games to 6, the correlation rises to r = 0.33. At 8 or more playoff games, it reaches r = 0.41. The playoff signal is there, but it is buried under the noise of small samples.

The honest conclusion: regular season save percentage is a meaningful predictor of playoff goaltending, but the signal is modest even under ideal conditions, and typical playoff series are too short for the signal to dominate the noise.

True Talent Estimation

If only 11% of the variance in save percentage is signal, the best prediction for next year is not a goalie's observed number. It is a shrunken estimate that pulls the observed number back toward the league average.

Using an empirical Bayes shrinkage model calibrated to the observed variance structure, the optimal adjustment regresses each goalie's save percentage approximately 42% of the way toward the league mean. This shrunk estimate reduces prediction error by 25% compared to using raw save percentage.

For the 2024-25 season entering the playoffs: Anthony Stolarz posted a .926 save percentage. His true talent estimate is closer to .915. Connor Hellebuyck posted .925, with a true talent estimate of .914. These are still excellent, but the gap between what they posted and what we should expect going forward is real and measurable.

Darcy Kuemper illustrates the reverse. After posting .890 in 2023-24, a number that made him look finished, regression predicted a bounce-back. He posted .921 this season. That is not a resurrection. It is the same statistical phenomenon working in the other direction.

What We Can and Cannot Say

The data support three claims with confidence:

First, the goaltender save percentage has a test-retest reliability of approximately 0.33, meaning roughly 11% of the observed variance is repeatable. This is low relative to other hockey statistics, and explains why goalie performance appears volatile.

Second, extreme seasons regress toward the league mean at a rate consistent with the reliability coefficient. There is no evidence of "clutch" performance or "pressure collapse" beyond what regression to the mean already predicts.

Third, a Bayesian shrinkage estimator that regresses observed save percentage 42% toward the league mean produces substantially better year-ahead predictions than raw save percentage.

The data does not support claims about whether specific goalies are "clutch" or "choke" in the playoffs. Playoff samples are too small and too contaminated by sampling noise to make individual-level inferences. A goalie who posts a save percentage of .935 in one playoff run and .905 in the next has not necessarily changed. The difference is well within the range that random variation can produce.

The data also cannot disentangle goaltender skill from defensive system effects. A goalie behind a strong defensive team faces fewer dangerous shots, inflating save percentage. The true talent estimates presented here are save-percentage-based and inherit this limitation. Expected-goals-based metrics would partially address this but are not available across the full 18-year sample.

The Playoff Implications

Every spring, teams enter the playoffs with goaltending expectations based on the regular season. A .925 goalie is treated as a massive advantage. A .905 goalie is treated as a liability. The regression framework suggests both assessments overweight the regular season number and underweight the league mean.

The goalie who posted .925 is more likely a .914 true talent who had a good year than a genuine .925 talent. The goalie who posted .905 is more likely a .908 true talent who had a bad year than a genuine .905 talent. The gap between them is probably 6 points, not 20.

This does not mean goaltending is random. It means goaltending differences are smaller than the regular season numbers suggest, and playoff performance will be determined more by game-to-game variance than by the talent gap between starters.

Sir Francis Galton figured this out with height measurements in Victorian England. It applies to every repeated measurement with noise. It certainly applies to the most volatile position in professional hockey.

Data: 1,869 regular season goalie-seasons and 479 playoff goalie-seasons from Hockey-Reference.com, 2007-08 through 2024-25. Minimum 30 GP for regular season reliability calculations. Empirical Bayes shrinkage calibrated to observed variance decomposition. Year-to-year pairs limited to consecutive or near-consecutive seasons. Survivorship bias acknowledged: goalies who lost their jobs are absent from follow-up data. Interactive figures built with Plotly.js.