What happens when you evaluate the NHL Draft the way you'd evaluate an occupational screening standard.
- •Draft AUC for identifying established NHLers (200+ GP): 0.76 - "fair to good," comparable to a general cognitive ability test
- •AUC for identifying stars: 0.86 - good at the top, leaky everywhere else
- •In a sensitivity analysis, including undrafted NHLers as false negatives drops AUC from 0.76 to 0.67 (about 20% of NHL rosters were never drafted)
- •The biggest cliff in draft equity is from picks 1-5 to picks 6-15, not from Round 1 to Round 2
- •Analytics era improvement in draft accuracy: not statistically significant (p = 0.095)
- •The draft is a triage tool, not an oracle
- 01 The Framework
- 02 What the Draft Is Trying to Do
- 03 The AUC: How Accurate Is the Screening Decision?
- 04 How Does the Draft Compare?
- 05 Discrimination Is Not Calibration
- 06 The Draft Cliff
- 07 The Undrafted Blind Spot
- 08 The Costliest Misses
- 09 The Cost of a Miss Depends on Where You Miss
- 10 Has the Draft Gotten Better?
- 11 What the Criterion Measures (and What It Doesn't)
- 12 A Triage Tool, Not an Oracle
The Framework
In occupational health science, there is a well-established toolkit for evaluating whether a screening instrument can discriminate between future successes and non-successes. When an organization needs to know whether a selection decision identifies who will succeed, the tools are the same regardless of domain: ROC curves, sensitivity and specificity, predictive values, and cut-point analysis that turns continuous scores into actionable decisions.
The core question is always the same: you have a selection decision, and you have a performance criterion. How well does that decision discriminate between those who will succeed and those who will not?
The NHL Entry Draft is not literally a screening test in the occupational sense. But it functions like a screening instrument: a high-stakes selection decision intended to predict future performance.
Every June, 32 teams spend millions of dollars on scouting, analytics, Combine testing, interviews, and video analysis to rank roughly 200 young players. The output is a composite decision: your draft position. Pick 1 means the system concluded you were the best prospect available. Pick 210 means it barely concluded you were worth an investment.
The individual “tests” inside that system are not fully public. But the draft position itself is the final screening decision produced by the whole assessment battery, not a single input in isolation.
The criterion is simple: did you make it?
To our knowledge, the draft is rarely evaluated as a screening instrument—using ROC/AUC, sensitivity/specificity, and cut-point analysis—and compared directly to published selection tests. Hockey analytics has produced pick value curves, success rates by round, and ML models that predict outcomes. This analysis asks a different question: how good is the draft, as a discriminative selection decision, using the same toolkit applied to high-stakes screening in other domains?
We pulled every NHL draft pick from 2000 to 2016 (4,006 picks across 17 drafts, each with at least 10 years to develop), linked each to NHL career outcomes, and evaluated the draft position as the screening score.
The answer is more interesting than a single number.
What the Draft Is Trying to Do
Before evaluating a screening-like selection instrument, you need to define the criterion. In occupational screening, this is the job performance measure: did the candidate succeed? In hockey, there is no single “right” outcome, so we tested several.
- Criterion A: Played at least 1 NHL game (any NHL career)
- Criterion B: Played 100+ career games (got a real shot)
- Criterion C: Played 200+ career games (established NHLer)
- Criterion D: Played 400+ career games (veteran)
- Criterion E: Career 0.60+ points per game with 200+ GP (top-6 caliber forward/elite defenseman)
- Criterion F: Career 0.80+ points per game with 200+ GP (star)
The base rates matter. Of the 4,006 drafted players in the sample, 47% eventually played at least one NHL game. 23% reached 200 games. Only 1.2% became stars by the 0.80 ppg threshold. The draft is screening for increasingly rare outcomes.
The AUC: How Accurate Is the Screening Decision?
The ROC curve plots the trade-off between sensitivity (correctly identifying future NHLers by drafting them earlier) and specificity (correctly identifying future non-NHLers by drafting them later) across every possible cut-point. The area under the curve (AUC) is the single best summary of discriminative ability.
Important: AUC summarizes rank-order discrimination. It is not the success probability at a given draft slot.
An AUC of 0.50 means the system is no better than random. An AUC of 1.00 means it perfectly separates future successes from non-successes.
Here is what the NHL Draft scores in this sample:
| Criterion | AUC | Base Rate |
|---|---|---|
| Any NHL game | 0.744 | 47.5% |
| 100+ GP | 0.765 | 28.5% |
| 200+ GP (established) | 0.760 | 23.2% |
| 400+ GP (veteran) | 0.777 | 16.9% |
| Top-6 caliber | 0.858 | 4.0% |
| Star (0.80+ ppg) | 0.864 | 1.2% |
| 200+ career points | 0.811 | 11.3% |
| 200+ career goals | 0.874 | 3.8% |
Figure 1: ROC curves for the NHL Draft across multiple career outcome criteria. Higher AUC indicates better discrimination. The draft is strongest for rarer, higher-end outcomes (e.g., 200+ career goals at 0.874; star criterion at 0.864) and weaker at separating marginal NHLers from non-NHLers (any NHL game at 0.744).
Two findings jump out immediately.
First, the draft’s AUC for identifying established NHLers (200+ GP) is 0.76. In the screening literature, that is “fair to good.” It means the draft ranks a random future established NHLer above a random future non-established player about 76% of the time.
Second, and more interesting: the AUC increases as the criterion gets harder. The draft is better at identifying the very top of the distribution than it is at separating marginal NHLers from non-NHLers. That makes intuitive sense. The difference between Connor McDavid and a non-NHLer is easier to see than the difference between a future fourth-line grinder and a career AHLer.
One necessary caution: the star criterion has a base rate near 1%. When prevalence is this low, AUC can look excellent even when positive predictive value remains modest. Discrimination (rank ordering) is not the same as prediction (probability estimation). We return to that distinction next.
How Does the Draft Compare?
Here is where the cross-disciplinary framing earns its keep. The draft’s AUC can be placed alongside published values for selection instruments in other high-stakes contexts.
This comparison is heuristic: contexts, criteria, and base rates differ. The point is to calibrate expectations about discrimination, not to claim equivalence.
| Screening Instrument | AUC | Source |
|---|---|---|
| NHL Draft (200+ career goals) | 0.874 | This analysis |
| NHL Draft (star caliber) | 0.864 | This analysis |
| NHL Draft (200+ career points) | 0.811 | This analysis |
| NHL Draft (400+ GP) | 0.777 | This analysis |
| NHL Draft (200+ GP) | 0.760 | This analysis |
| NHL Draft (100+ GP) | 0.765 | This analysis |
| NHL Draft (any NHL game) | 0.744 | This analysis |
| General cognitive ability test | 0.740 | Schmidt & Hunter, 1998 |
| Structured interview (job performance) | 0.710 | Schmidt & Hunter, 1998 |
| Typical occupational screening standard | 0.700 | Cascio & Aguinis, 2019 |
| College GPA predicting job performance | 0.650 | Roth et al., 2005 |
| Reference checks | 0.570 | Schmidt & Hunter, 1998 |
| Unstructured interview | 0.560 | Schmidt & Hunter, 1998 |
Figure 3: The NHL Draft’s AUC compared to published selection instruments. The draft’s ability to discriminate established NHLers (AUC ≈ 0.76) sits in the same broad range as several widely used screening tools.
Put differently: on a discrimination metric, the draft is “fair to good” in a domain that is inherently noisy (projecting teenagers into a professional sport). That is genuinely impressive. It also contextualizes the confidence we should place in any individual pick.
Discrimination Is Not Calibration
AUC answers one question: does the draft tend to rank future NHLers ahead of future non-NHLers? That is discrimination. It tells you whether the ordering is directionally right. But it says nothing about whether teams are calibrated about the actual probability of success at each draft slot.
A system can have a decent AUC and still be overconfident at the top of the board.
Two selection systems can have identical AUCs but very different calibration. One might assign realistic confidence levels to each score band. The other might behave as though high scores are near-certainties when the historical success rates are materially lower.
The draft appears to have a calibration problem. Organizations treat top-10 picks as franchise anchors: multiple assets traded to move up, immediate expectations, and planning built around assumed stardom. But even at the very top, uncertainty is substantial. The correct response to that uncertainty is not necessarily to draft differently—it is to invest differently around picks, with more hedging, more contingency planning, and less binary conviction that pick 8 is fundamentally different from pick 14.
The draft may be reasonably good at ranking prospects. It is likely worse at pricing uncertainty.
The Draft Cliff
Most occupational screening standards use a single cut-score: you either pass or you fail. Above the threshold, you're in. Below it, you're out. It's clean, it's defensible, and it's how most selection systems work.
But a single cut-score throws away information. A candidate who barely passes and a candidate who crushes the test both get the same outcome: "pass." And a candidate one point below the line and a candidate 50 points below get the same outcome: "fail." The binary decision ignores the gradient.
An alternative is to replace the single threshold with probability zones: ranges of the continuous score that correspond to meaningfully different success rates. Instead of pass/fail, you get a probability profile. This approach preserves the information in the continuous score and gives decision-makers a richer picture of what to expect.
Applied to the draft:
| Zone | Picks | NHL (1+ GP) | Established (200+ GP) | Top-6 Caliber | Star | Avg Career GP | Avg Career Pts |
|---|---|---|---|---|---|---|---|
| Green | 1-5 | 100% | 94% | 51% | 25% | 850 | 571 |
| Green-Yellow | 6-15 | 96% | 75% | 19% | 5% | 554 | 274 |
| Yellow | 16-30 | 91% | 59% | 13% | 2% | 417 | 186 |
| Yellow-Red | 31-60 | 68% | 32% | 3% | 1% | 211 | 79 |
| Red | 61-90 | 54% | 21% | 3% | 1% | 137 | 49 |
| Deep Red | 91-150 | 39% | 15% | 1% | 0% | 89 | 29 |
| Late round | 151+ | 28% | 10% | 1% | 0% | 61 | 19 |
Figure 4: Career outcome distributions by draft zone. Picks 1-15 produce franchise-length careers (700+ GP) more than half the time. By the middle rounds, the majority never play in the NHL.
The biggest drop in draft equity is not from Round 1 to Round 2. It's from elite to merely first-round.
A top-5 pick has a 51% chance of becoming a top-6 caliber player. A pick from 6-15 has a 19% chance. That's a 2.7x difference across just 10 slots. The cliff is steeper than most people realize, and it's steeper than the trade value charts that teams use to price draft capital. Average career points fall from 571 (picks 1-5) to 274 (picks 6-15) to 186 (picks 16-30). The gradient is not linear. It's a cliff, and it happens inside the first round.
By the second round (picks 31-60), two-thirds of players will never become established NHLers. By the middle rounds, you're buying a lottery ticket with a 10-15% payout rate.
Sensitivity and Specificity: The Numbers GMs Should Know
At pick 30 (end of the first round), the draft's sensitivity for the 200 GP criterion is 40%. That means 60% of all future established NHLers were drafted outside the first round. The first round captures less than half of the eventual talent.
The positive predictive value at pick 30 is 71%. If your player was selected in the first round, there's a 71% chance they'll become an established NHLer. That's solid.
But the false negative count is staggering: 608 players drafted after pick 30 who went on to reach 200+ career GP. Six hundred and eight. The draft's biggest weakness isn't who it selects early. It's who it overlooks.
The Undrafted Blind Spot
Everything above evaluates the draft only on the players it actually assessed. But a screening instrument’s accuracy should also consider qualified candidates it failed to screen entirely.
Figure 5: Including undrafted players who reached each career threshold drops the AUC substantially. The draft's biggest weakness is who it fails to evaluate entirely.
Approximately 20% of NHL rosters consist of undrafted players. Martin St. Louis. Ed Belfour. Curtis Joseph. Dino Ciccarelli. These are not fringe cases. They are among the best players in NHL history, and the draft assigned them a score of zero.
As a sensitivity analysis, we add an estimated undrafted cohort that reached each career threshold and treat them as false negatives with a draft score of zero. Under this assumption, the AUC for the 200+ GP criterion drops from 0.760 to 0.673. This treats all undrafted successes as having a draft ‘score’ of zero, which is intentionally conservative for the draft and meant to quantify the blind spot, not to estimate a precise ‘true AUC.
The exact magnitude depends on the undrafted-count estimate, but the direction is unambiguous: excluding undrafted players inflates apparent draft accuracy. The draft’s biggest structural weakness is not only who it ranks incorrectly; it is who it fails to evaluate at all.
What Kind of Player Do You Get?
Figure 2: Probability of career outcome by draft pick range. The steepest cliff is from picks 1-5 to picks 6-15, not from Round 1 to Round 2.
Games played alone doesn't tell you if a pick became a useful player or just a warm body. The career outcome distributions add the quality dimension:
Picks 1-15: 59% reach 700+ career GP. The median outcome is a long, productive NHL career. Production intensity: 22.4 points per available year. Average points-per-game among players who stuck: 0.55 ppg. These are legitimate top-of-the-lineup contributors.
Picks 16-30: 31% reach 700+ GP. Still a strong bet, but you're now more likely to get a 400-600 GP middle-six contributor than a franchise player. Production intensity drops to 10.5 points per year.
Picks 31-60: The break point. 31% never play a single NHL game. 13% reach 700+ GP. The second round is where scouting accuracy separates from noise. Production intensity: 4.7 points per year.
Picks 91-150: 59% never play in the NHL. Among those who do, the average points-per-game is 0.33, a bottom-six profile. But this is also where the biggest draft steals live.
Picks 151+: 70% never play. But Joe Pavelski (pick 205, 1,068 career points), Patric Hornqvist (pick 230, 543 points), and Dustin Byfuglien (pick 245, 525 points) all came from this wasteland.
The Costliest Misses
The false negatives with the highest career production, all drafted after pick 60:
| Pick | Year | Player | Career GP | Career Pts | Pts/GP |
|---|---|---|---|---|---|
| 104 | 2011 | John Gaudreau | 763 | 743 | 0.97 |
| 77 | 2013 | Jake Guentzel | 662 | 638 | 0.96 |
| 79 | 2014 | Brayden Point | 701 | 675 | 0.96 |
| 178 | 2010 | Mark Stone | 749 | 694 | 0.93 |
| 71 | 2006 | Brad Marchand | 1,152 | 1,034 | 0.90 |
| 66 | 2016 | Adam Fox | 467 | 399 | 0.85 |
| 205 | 2003 | Joe Pavelski | 1,332 | 1,068 | 0.80 |
| 129 | 2007 | Jamie Benn | 1,233 | 982 | 0.80 |
| 62 | 2005 | Kris Letang | 1,218 | 799 | 0.66 |
Thirty-three players in the dataset were drafted after pick 60 and went on to play 400+ games at 0.60+ points per game. Thirty-three legitimate top-six NHL players that the first two rounds of the draft missed.
On the other side: 15 top-15 picks in this sample never played a single NHL game. Not one. Kyle Beach (11th overall, 2008), Hugh Jessiman (12th, 2003), A.J. Thelen (12th, 2004). The most expensive selections in the draft sometimes produce literally zero return.
The Cost of a Miss Depends on Where You Miss
Not all screening errors are equal. In most occupational screening contexts, a false positive and a false negative carry different organizational costs. The same asymmetry applies here, and it's extreme.
Missing Brayden Point at pick 79 is not the same as over-drafting a depth forward at pick 24. A top-10 miss consumes one of the most valuable assets in professional sports: a high draft pick in a league with a hard salary cap. The opportunity cost includes not just the missed player but the cap-cycle timing, the franchise surplus value that a star on an entry-level contract generates, and the years of organizational direction built around a player who doesn't become what you expected. When the Arizona Coyotes selected Kyle Beach 11th overall in 2008 and he never played an NHL game, the cost wasn't just one roster spot. It was a franchise-shaping asset that returned nothing.
A fifth-round miss, by contrast, is almost noise. The expected value of pick 150 is low enough that a miss barely registers organizationally.
The draft is not just a prediction problem. It is a resource-allocation problem under asymmetric error costs. The screening framework can tell you where the system fails. Decision analysis tells you how much each failure costs. The first-round false positive is organizationally devastating in a way that no number of late-round misses can match, and the late-round false negative (the Pavelskis and Marchands who slip through) represents surplus value that could reshape a franchise if captured systematically.
This has direct implications for how teams should structure their scouting investment. The rational allocation would concentrate the most rigorous evaluation resources not on the consensus top-5 picks (where discrimination is already strong), but on the picks 15-60 range where the screening system is weakest and the organizational cost of errors is still substantial.
Has the Draft Gotten Better?
The analytics revolution supposedly transformed NHL scouting. More data, better models, smarter front offices. Did it actually improve the draft's accuracy as a screening system?
We compared the 2000-2007 era (AUC = 0.741) to the 2008-2015 era (AUC = 0.768). The difference is +0.026, which goes in the right direction. But a 10,000-iteration bootstrap test returned a 95% confidence interval of [-0.013, +0.065] and a p-value of 0.095.
The improvement is not statistically significant.
The AUC trended upward across every criterion (+0.024 to +0.026 depending on the threshold), and the direction is consistent, which is suggestive. But with the data we have, we cannot conclude that the analytics era measurably improved scouting accuracy. The draft has always been about this good, and it still is.
Positions: Forwards, Defensemen, and Goalies
The draft evaluates forwards most accurately (AUC = 0.773 for 200 GP), defensemen slightly less so (0.752), and goalies worst of all (0.715).
Every hockey person already believes goalies are the hardest position to draft. Now we have the number. The AUC gap between forwards and goalies (0.058) is substantial. Drafting a goalie in the first round is a meaningfully riskier proposition than drafting a forward, and the screening system's accuracy reflects that.
What the Criterion Measures (and What It Doesn't)
The biggest caveat, and it needs to be stated clearly: the draft partially creates the outcomes it appears to predict.
A first-round pick gets years of developmental patience, top-tier coaching, multiple AHL seasons, and repeated NHL call-ups. A seventh-round pick gets one training camp and maybe one AHL contract before being released. The draft doesn't just evaluate talent. It allocates opportunity. And more opportunity produces better outcomes regardless of underlying ability.
This is analogous to evaluating a selection process where candidates who scored higher also received more developmental investment before the job performance criterion was measured. The screening decision appears more accurate than it actually is because it influenced the conditions under which the criterion was assessed. Draft position may partly measure scouting confidence, but it also becomes an intervention that changes the downstream criterion.
This means the draft's true discriminative ability is likely lower than the AUC suggests. How much lower is impossible to quantify without a controlled experiment that will never happen (you can't randomly assign draft positions to test causation). But the direction of the bias is clear: the AUC is inflated by the system's self-reinforcing structure.
Beyond this self-fulfilling prophecy, the criterion itself has limitations that should be named explicitly. In selection science, these fall into two categories.
Criterion contamination: Career games played and points are not pure measures of hockey talent. They are contaminated by organizational opportunity, development environment, injuries, team context, depth-chart blockage, and usage patterns. A talented player buried behind established veterans on a deep roster accumulates fewer games than an equally talented player given top-line minutes on a rebuilding team. The criterion measures career outcomes, not ability in isolation.
Criterion deficiency: Games and points also miss important dimensions of hockey value. Defensive contribution, penalty-kill utility, playoff performance, leadership, and role-specific value (a shutdown defenseman's worth is invisible in points-per-game). A player with 300 GP and 0.35 ppg who anchors a penalty kill for a decade may be more valuable than a 400 GP, 0.50 ppg winger who plays sheltered minutes, but the criterion treats the second player as the better outcome.
These limitations don't invalidate the analysis. Career GP and points are the best available proxies for "did this player have a meaningful NHL career," and they are the criteria used in virtually every draft evaluation study. But they are proxies, not ground truth, and the AUC values should be interpreted with that understanding.
A Triage Tool, Not an Oracle
The NHL Draft is a fair to good screening instrument. It correctly identifies generational stars with high accuracy (AUC 0.86) and reliably separates the top tier of prospects from the rest. Its Green Zone (picks 1-5) delivers franchise-caliber players more than half the time.
But it has real weaknesses. It misses approximately 12% of eventual established NHLers entirely (the undrafted blind spot). Its false-negative rate after the first round is enormous. Its accuracy has not measurably improved despite two decades of analytics adoption. It discriminates reasonably well but appears poorly calibrated for the confidence organizations place in individual picks. And the criterion it predicts is contaminated by the opportunities the draft itself allocates.
None of this means the draft is broken. An AUC of 0.76 in a domain this noisy, projecting teenagers into a professional sport, is genuinely impressive. But the right way to think about the draft is as a triage tool, not an oracle. Screening systems are built to prioritize finite resources, not to deliver perfect truth. The draft sorts prospects into probability bands that should inform investment decisions, development timelines, and organizational patience. It does that reasonably well.
What it cannot do is tell you with high confidence that pick 8 is meaningfully different from pick 14, that a second-round forward is less likely to succeed than a first-round defenseman, or that the system has identified all the players worth investing in. The screening decision is real. The screening decision is imperfect. And now, for the first time, we know exactly how imperfect it is, measured the same way we measure every other high-stakes selection system.
A first-round pick is a 59-82% bet on an established career, not a guarantee. A second-round pick is a coin flip. And a late-round pick is a long shot that occasionally produces Joe Pavelski.
The test is a triage tool. Treat it like one.
Data: 4,006 NHL draft picks, 2000-2016, from Hockey-Reference.com. Career statistics through 2025-26 season. Occupational screening comparison values from Schmidt & Hunter (1998), Cascio & Aguinis (2019), and Roth et al. (2005). AUC computed using scikit-learn. Bootstrap confidence intervals based on 10,000 iterations. All analysis code available on request.
This is the first in what may become a series applying occupational health science methodology to sports evaluation systems. The same framework applies to any domain where a screening decision is used to predict future performance: the NBA Draft, the NFL Combine, baseball's amateur draft, and beyond. Next: What is one draft slot actually worth?