The Screening Test: How Well Does the NHL Draft Work?

What happens when you evaluate the NHL Draft the way you'd evaluate an occupational screening standard.

Key Findings

•Draft AUC for identifying established NHLers (200+ GP): 0.76 - "fair to good," comparable to a general cognitive ability test
•AUC for identifying stars: 0.86 - good at the top, leaky everywhere else
•Including undrafted players drops AUC from 0.76 to 0.67 - about 20% of NHL rosters were never drafted
•The biggest cliff in draft equity is from picks 1-5 to picks 6-15, not from Round 1 to Round 2
•Analytics era improvement in draft accuracy: not statistically significant (p = 0.095)
•The draft is a triage tool, not an oracle

In This Analysis

01 The Framework
02 What the Draft Is Trying to Do
03 The AUC: How Accurate Is the Screening Decision?
04 How Does the Draft Compare?
05 Discrimination Is Not Calibration
06 The Draft Cliff
07 The Undrafted Blind Spot
08 The Costliest Misses
09 The Cost of a Miss Depends on Where You Miss
10 Has the Draft Gotten Better?
11 What the Criterion Measures (and What It Doesn't)
12 A Triage Tool, Not an Oracle

The Framework

In occupational health science, there is a well-established toolkit for evaluating whether a screening system actually predicts job performance. When an organization needs to know if its selection process identifies who will succeed, the tools are the same regardless of the domain: ROC curves, sensitivity and specificity, positive and negative predictive values, and cut-score analysis that translates continuous scores into actionable decisions.

The core question is always the same. You have a screening system. You have a job performance criterion. How well does the screening system predict the criterion?

The NHL Entry Draft is a screening system.

Every June, 32 teams spend millions of dollars on scouting, analytics, Combine testing, interviews, and video analysis to rank approximately 200 young hockey players. The output is a composite decision: your draft position. Pick 1 means the entire evaluation apparatus concluded you're the best prospect in the world. Pick 210 means it barely thinks you're worth a roster spot.

The actual "tests" in this system are the individual inputs: the NHL Combine's physical assessments, junior league statistics, scouting reports, interview evaluations. Those individual test results are not fully public. But the draft position itself represents the screening decision that results from all those inputs combined. It's the overall hiring recommendation produced by the full assessment battery, not a single test in isolation.

The criterion is simple: did you make it?

Nobody has ever evaluated the draft using the methodology used to evaluate occupational screening standards. The hockey analytics community has built ML models to predict draft outcomes, computed draft pick value curves, and published success rates by round. But nobody has treated the draft as a screening instrument and asked: what's the AUC? What are the sensitivity and specificity at key cut-points? What are the false positive and false negative rates? And how does it compare to screening systems used across other high-stakes selection contexts?

We pulled every NHL draft pick from 2000 to 2016 (4,006 picks across 17 drafts, all with at least 10 years to develop), linked each to their career NHL statistics, and ran the same analysis you'd run on any occupational screening standard.

The answer is more interesting than a single number.

What the Draft Is Trying to Do

Before evaluating a screening system, you need to define the criterion. In occupational screening, this is the job performance measure: did the candidate succeed? In hockey, there's no single right answer, so we tested several.

Criterion A: Played at least 1 NHL game (any NHL career)
Criterion B: Played 100+ career games (got a real shot)
Criterion C: Played 200+ career games (established NHLer)
Criterion D: Played 400+ career games (veteran)
Criterion E: Career 0.60+ points per game with 200+ GP (top-6 caliber forward/elite defenseman)
Criterion F: Career 0.80+ points per game with 200+ GP (star)

The base rates matter. Of the 4,006 drafted players in the sample, 47% eventually played at least one NHL game. 23% reached 200 games. Only 1.2% became stars by the 0.80 ppg threshold. The draft is screening for increasingly rare outcomes.

The AUC: How Accurate Is the Screening Decision?

The ROC curve plots the trade-off between sensitivity (correctly identifying future NHLers by drafting them early) and specificity (correctly identifying future busts by drafting them late) across every possible cut-point. The area under this curve (AUC) is the single best summary of a screening system's discriminative ability. An AUC of 0.50 means the system is no better than random. An AUC of 1.00 means it's perfect.

Here's what the NHL Draft scores:

Criterion	AUC	Base Rate
Any NHL game	0.744	47.5%
100+ GP	0.765	28.5%
200+ GP (established)	0.760	23.2%
400+ GP (veteran)	0.777	16.9%
Top-6 caliber	0.858	4.0%
Star (0.80+ ppg)	0.864	1.2%
200+ career points	0.811	11.3%
200+ career goals	0.874	3.8%

Figure 1: ROC curves for the NHL Draft across four career outcome criteria. Higher AUC indicates better discrimination. The draft is most accurate at identifying stars (0.874) and weakest at separating marginal NHLers from non-NHLers (0.744).

Two findings jump out immediately.

First, the draft's AUC for identifying established NHLers is 0.76. In the screening literature, that's "fair to good." Not great, not terrible. It means the draft correctly ranks a random future NHLer above a random future non-NHLer about 76% of the time.

Second, and more interesting: the AUC increases as the criterion gets harder. The draft is better at identifying stars (0.86) than it is at identifying regular NHLers (0.76). This makes intuitive sense. Connor McDavid, Sidney Crosby, Alexander Ovechkin are obvious at draft time. The difference between a future 4th-line grinder and a career AHLer is much harder to see. The screening system is most accurate where it matters most, and weakest in the middle of the distribution where the decisions are hardest.

A necessary caution here: the star criterion has a base rate of 1.2%. When prevalence is this low, AUC can appear excellent even when the positive predictive value remains modest. An AUC of 0.86 means the draft ranks stars ahead of non-stars 86% of the time, but it does not mean that a high pick has an 86% chance of becoming a star. The distinction between discrimination (rank ordering) and prediction (probability estimation) matters, and we'll return to it.

How Does the Draft Compare?

Here's where the cross-disciplinary framing earns its keep. The draft's AUC can be placed alongside published values for screening instruments across other high-stakes selection contexts:

Screening Instrument	AUC	Source
NHL Draft (200+ career goals)	0.874	This analysis
NHL Draft (star caliber)	0.864	This analysis
NHL Draft (200+ career points)	0.811	This analysis
NHL Draft (400+ GP)	0.777	This analysis
NHL Draft (200+ GP)	0.760	This analysis
NHL Draft (100+ GP)	0.765	This analysis
NHL Draft (any NHL game)	0.744	This analysis
General cognitive ability test	0.740	Schmidt & Hunter, 1998
Structured interview (job performance)	0.710	Schmidt & Hunter, 1998
Typical occupational screening standard	0.700	Cascio & Aguinis, 2019
College GPA predicting job performance	0.650	Roth et al., 2005
Reference checks	0.570	Schmidt & Hunter, 1998
Unstructured interview	0.560	Schmidt & Hunter, 1998

Figure 3: The NHL Draft's AUC compared to published occupational screening standards. The draft's ability to identify established NHLers (0.760) is comparable to a general cognitive ability test (0.740).

The NHL draft's ability to identify established NHLers (AUC = 0.76) sits right between a general cognitive ability test (0.74) and many published occupational screening standards. For identifying stars, it exceeds most selection instruments in the literature.

Put differently: the most expensive, resource-intensive talent evaluation process in hockey achieves the discriminative power of a well-designed standardized assessment. Teams spend millions to reach an accuracy level that is typical for high-stakes personnel selection across many industries.

That's not a criticism. An AUC of 0.76 in a domain as noisy as projecting 17-year-old athletes is genuinely impressive. But it contextualizes the confidence we should place in any individual draft pick.

Discrimination Is Not Calibration

AUC answers one question: does the draft rank future NHLers ahead of future non-NHLers? That's discrimination. It tells you whether the ordering is right. But it says nothing about whether teams are well-calibrated about the actual probability of success at each draft slot.

Two screening systems can have identical AUCs but very different calibration. One might assign realistic confidence levels to each score band. The other might behave as though high scores are near-certainties when the actual success rates are materially lower.

The draft appears to have a calibration problem. Look at how organizations treat top-10 picks: multiple assets traded to move up, franchise expectations on arrival, contracts structured around assumed stardom. But the data shows that even the Green Zone (picks 1-5) produces a top-6 caliber player only 51% of the time. The 6-15 range delivers a star just 5% of the time. The organizational confidence implied by the resources spent to acquire these picks often exceeds what the historical probabilities justify.

The draft may be decent at ranking prospects. It is likely much worse at pricing uncertainty.

This distinction matters for every front office. If the discrimination is real but the calibration is off, the correct response isn't to draft differently. It's to invest differently around draft picks, building in more hedging, more contingency planning, and less binary conviction that pick 8 is fundamentally different from pick 14.

The Draft Cliff

Most occupational screening standards use a single cut-score: you either pass or you fail. Above the threshold, you're in. Below it, you're out. It's clean, it's defensible, and it's how most selection systems work.

But a single cut-score throws away information. A candidate who barely passes and a candidate who crushes the test both get the same outcome: "pass." And a candidate one point below the line and a candidate 50 points below get the same outcome: "fail." The binary decision ignores the gradient.

An alternative is to replace the single threshold with probability zones: ranges of the continuous score that correspond to meaningfully different success rates. Instead of pass/fail, you get a probability profile. This approach preserves the information in the continuous score and gives decision-makers a richer picture of what to expect.

Applied to the draft:

Zone	Picks	NHL (1+ GP)	Established (200+ GP)	Top-6 Caliber	Star	Avg Career GP	Avg Career Pts
Green	1-5	100%	94%	51%	25%	850	571
Green-Yellow	6-15	96%	75%	19%	5%	554	274
Yellow	16-30	91%	59%	13%	2%	417	186
Yellow-Red	31-60	68%	32%	3%	1%	211	79
Red	61-90	54%	21%	3%	1%	137	49
Deep Red	91-150	39%	15%	1%	0%	89	29
Undrafted	151+	28%	10%	1%	0%	61	19

Figure 4: Career outcome distributions by draft zone. Picks 1-15 produce franchise-length careers (700+ GP) more than half the time. By the middle rounds, the majority never play in the NHL.

The biggest drop in draft equity is not from Round 1 to Round 2. It's from elite to merely first-round.

A top-5 pick has a 51% chance of becoming a top-6 caliber player. A pick from 6-15 has a 19% chance. That's a 2.7x difference across just 10 slots. The cliff is steeper than most people realize, and it's steeper than the trade value charts that teams use to price draft capital. Average career points fall from 571 (picks 1-5) to 274 (picks 6-15) to 186 (picks 16-30). The gradient is not linear. It's a cliff, and it happens inside the first round.

By the second round (picks 31-60), two-thirds of players will never become established NHLers. By the middle rounds, you're buying a lottery ticket with a 10-15% payout rate.

Sensitivity and Specificity: The Numbers GMs Should Know

At pick 30 (end of the first round), the draft's sensitivity for the 200 GP criterion is 40%. That means 60% of all future established NHLers were drafted outside the first round. The first round captures less than half of the eventual talent.

The positive predictive value at pick 30 is 71%. If your player was selected in the first round, there's a 71% chance they'll become an established NHLer. That's solid.

But the false negative count is staggering: 608 players drafted after pick 30 who went on to reach 200+ career GP. Six hundred and eight. The draft's biggest weakness isn't who it selects early. It's who it overlooks.

Everything above evaluates the draft only on the players it actually assessed. But a screening system's accuracy should also account for qualified candidates it failed to screen entirely.

Figure 5: Including undrafted players who reached each career threshold drops the AUC substantially. The draft's biggest weakness is who it fails to evaluate entirely.

Approximately 20% of NHL rosters consist of undrafted players. Martin St. Louis (1,033 career points, Hart Trophy, Art Ross). Ed Belfour (484 career wins, Hockey Hall of Fame). Curtis Joseph (454 wins). Dino Ciccarelli (608 goals). These are not fringe cases. They are among the best players in NHL history, and the scouting system assigned them a score of zero.

When we add an estimated 120 undrafted players who reached the 200 GP threshold from this cohort to the analysis as false negatives with a test score of zero, the AUC drops from 0.760 to 0.673. That's a substantial fall, moving the draft from "fair to good" to "fair" as a talent identification system.

If 12% of your successful outcomes came from candidates your selection process would have rejected entirely, you have a false-negative problem. The draft's blind spot for late-developing, non-traditional, or overlooked talent is its most significant structural weakness.

What Kind of Player Do You Get?

Figure 2: Probability of career outcome by draft pick range. The steepest cliff is from picks 1-5 to picks 6-15, not from Round 1 to Round 2.

Games played alone doesn't tell you if a pick became a useful player or just a warm body. The career outcome distributions add the quality dimension:

Picks 1-15: 59% reach 700+ career GP. The median outcome is a long, productive NHL career. Production intensity: 22.4 points per available year. Average points-per-game among players who stuck: 0.55 ppg. These are legitimate top-of-the-lineup contributors.

Picks 16-30: 31% reach 700+ GP. Still a strong bet, but you're now more likely to get a 400-600 GP middle-six contributor than a franchise player. Production intensity drops to 10.5 points per year.

Picks 31-60: The break point. 31% never play a single NHL game. 13% reach 700+ GP. The second round is where scouting accuracy separates from noise. Production intensity: 4.7 points per year.

Picks 91-150: 59% never play in the NHL. Among those who do, the average points-per-game is 0.33, a bottom-six profile. But this is also where the biggest draft steals live.

Picks 151+: 70% never play. But Joe Pavelski (pick 205, 1,068 career points), Patric Hornqvist (pick 230, 543 points), and Dustin Byfuglien (pick 245, 525 points) all came from this wasteland.

The Costliest Misses

The false negatives with the highest career production, all drafted after pick 60:

Pick	Year	Player	Career GP	Career Pts	Pts/GP
104	2011	John Gaudreau	763	743	0.97
77	2013	Jake Guentzel	662	638	0.96
79	2014	Brayden Point	701	675	0.96
178	2010	Mark Stone	749	694	0.93
71	2006	Brad Marchand	1,152	1,034	0.90
66	2016	Adam Fox	467	399	0.85
205	2003	Joe Pavelski	1,332	1,068	0.80
129	2007	Jamie Benn	1,233	982	0.80
62	2005	Kris Letang	1,218	799	0.66

Thirty-three players in the dataset were drafted after pick 60 and went on to play 400+ games at 0.60+ points per game. Thirty-three legitimate top-six NHL players that the first two rounds of the draft missed.

On the other side: 15 top-15 picks in this sample never played a single NHL game. Not one. Kyle Beach (11th overall, 2008), Hugh Jessiman (12th, 2003), A.J. Thelen (12th, 2004). The most expensive selections in the draft sometimes produce literally zero return.

The Cost of a Miss Depends on Where You Miss

Not all screening errors are equal. In most occupational screening contexts, a false positive and a false negative carry different organizational costs. The same asymmetry applies here, and it's extreme.

Missing Brayden Point at pick 79 is not the same as over-drafting a depth forward at pick 24. A top-10 miss consumes one of the most valuable assets in professional sports: a high draft pick in a league with a hard salary cap. The opportunity cost includes not just the missed player but the cap-cycle timing, the franchise surplus value that a star on an entry-level contract generates, and the years of organizational direction built around a player who doesn't become what you expected. When the Arizona Coyotes selected Kyle Beach 11th overall in 2008 and he never played an NHL game, the cost wasn't just one roster spot. It was a franchise-shaping asset that returned nothing.

A fifth-round miss, by contrast, is almost noise. The expected value of pick 150 is low enough that a miss barely registers organizationally.

The draft is not just a prediction problem. It is a resource-allocation problem under asymmetric error costs. The screening framework can tell you where the system fails. Decision analysis tells you how much each failure costs. The first-round false positive is organizationally devastating in a way that no number of late-round misses can match, and the late-round false negative (the Pavelskis and Marchands who slip through) represents surplus value that could reshape a franchise if captured systematically.

This has direct implications for how teams should structure their scouting investment. The rational allocation would concentrate the most rigorous evaluation resources not on the consensus top-5 picks (where discrimination is already strong), but on the picks 15-60 range where the screening system is weakest and the organizational cost of errors is still substantial.

Has the Draft Gotten Better?

The analytics revolution supposedly transformed NHL scouting. More data, better models, smarter front offices. Did it actually improve the draft's accuracy as a screening system?

We compared the 2000-2007 era (AUC = 0.741) to the 2008-2015 era (AUC = 0.768). The difference is +0.026, which goes in the right direction. But a 10,000-iteration bootstrap test returned a 95% confidence interval of [-0.013, +0.065] and a p-value of 0.095.

The improvement is not statistically significant.

The AUC trended upward across every criterion (+0.024 to +0.026 depending on the threshold), and the direction is consistent, which is suggestive. But with the data we have, we cannot conclude that the analytics era measurably improved scouting accuracy. The draft has always been about this good, and it still is.

Positions: Forwards, Defensemen, and Goalies

The draft evaluates forwards most accurately (AUC = 0.773 for 200 GP), defensemen slightly less so (0.752), and goalies worst of all (0.715).

Every hockey person already believes goalies are the hardest position to draft. Now we have the number. The AUC gap between forwards and goalies (0.058) is substantial. Drafting a goalie in the first round is a meaningfully riskier proposition than drafting a forward, and the screening system's accuracy reflects that.

What the Criterion Measures (and What It Doesn't)

The biggest caveat, and it needs to be stated clearly: the draft partially creates the outcomes it appears to predict.

A first-round pick gets years of developmental patience, top-tier coaching, multiple AHL seasons, and repeated NHL call-ups. A seventh-round pick gets one training camp and maybe one AHL contract before being released. The draft doesn't just evaluate talent. It allocates opportunity. And more opportunity produces better outcomes regardless of underlying ability.

This is analogous to evaluating a selection process where candidates who scored higher also received more developmental investment before the job performance criterion was measured. The screening decision appears more accurate than it actually is because it influenced the conditions under which the criterion was assessed. Draft position may partly measure scouting confidence, but it also becomes an intervention that changes the downstream criterion.

This means the draft's true discriminative ability is likely lower than the AUC suggests. How much lower is impossible to quantify without a controlled experiment that will never happen (you can't randomly assign draft positions to test causation). But the direction of the bias is clear: the AUC is inflated by the system's self-reinforcing structure.

Beyond this self-fulfilling prophecy, the criterion itself has limitations that should be named explicitly. In selection science, these fall into two categories.

Criterion contamination: Career games played and points are not pure measures of hockey talent. They are contaminated by organizational opportunity, development environment, injuries, team context, depth-chart blockage, and usage patterns. A talented player buried behind established veterans on a deep roster accumulates fewer games than an equally talented player given top-line minutes on a rebuilding team. The criterion measures career outcomes, not ability in isolation.

Criterion deficiency: Games and points also miss important dimensions of hockey value. Defensive contribution, penalty-kill utility, playoff performance, leadership, and role-specific value (a shutdown defenseman's worth is invisible in points-per-game). A player with 300 GP and 0.35 ppg who anchors a penalty kill for a decade may be more valuable than a 400 GP, 0.50 ppg winger who plays sheltered minutes, but the criterion treats the second player as the better outcome.

These limitations don't invalidate the analysis. Career GP and points are the best available proxies for "did this player have a meaningful NHL career," and they are the criteria used in virtually every draft evaluation study. But they are proxies, not ground truth, and the AUC values should be interpreted with that understanding.

A Triage Tool, Not an Oracle

The NHL Draft is a fair to good screening instrument. It correctly identifies generational stars with high accuracy (AUC 0.86) and reliably separates the top tier of prospects from the rest. Its Green Zone (picks 1-5) delivers franchise-caliber players more than half the time.

But it has real weaknesses. It misses approximately 12% of eventual established NHLers entirely (the undrafted blind spot). Its false-negative rate after the first round is enormous. Its accuracy has not measurably improved despite two decades of analytics adoption. It discriminates reasonably well but appears poorly calibrated for the confidence organizations place in individual picks. And the criterion it predicts is contaminated by the opportunities the draft itself allocates.

None of this means the draft is broken. An AUC of 0.76 in a domain this noisy, projecting teenagers into a professional sport, is genuinely impressive. But the right way to think about the draft is as a triage tool, not an oracle. Screening systems are built to prioritize finite resources, not to deliver perfect truth. The draft sorts prospects into probability bands that should inform investment decisions, development timelines, and organizational patience. It does that reasonably well.

What it cannot do is tell you with high confidence that pick 8 is meaningfully different from pick 14, that a second-round forward is less likely to succeed than a first-round defenseman, or that the system has identified all the players worth investing in. The screening decision is real. The screening decision is imperfect. And now, for the first time, we know exactly how imperfect it is, measured the same way we measure every other high-stakes selection system.

A first-round pick is a 59-82% bet on an established career, not a guarantee. A second-round pick is a coin flip. And a late-round pick is a long shot that occasionally produces Joe Pavelski.

The test is a triage tool. Treat it like one.

Data: 4,006 NHL draft picks, 2000-2016, from Hockey-Reference.com. Career statistics through 2025-26 season. Occupational screening comparison values from Schmidt & Hunter (1998), Cascio & Aguinis (2019), and Roth et al. (2005). AUC computed using scikit-learn. Bootstrap confidence intervals based on 10,000 iterations. All analysis code available on request.

This is the first in what may become a series applying occupational health science methodology to sports evaluation systems. The same framework applies to any domain where a screening decision is used to predict future performance: the NBA Draft, the NFL Combine, baseball's amateur draft, and beyond. Next: What is one draft slot actually worth?