Intro
NHL Equivalency (NHLe) is designed to answer the question "what is a point in league X worth in the NHL?" The motivation behind NHLe is straight forward: Objectively compare production in different leagues on an apples-to-apples basis.
NHLe is descriptive by default but, as is the case with many descriptive tools, the prescriptive benefits immediately follow. For example, a team deciding whether to draft Ivar Stenberg or Ethan Belchetz can compare Stenberg's 33 points in 43 SHL games (0.77 P/GP) to Ethan Belchetz's 59 points in 57 games (1.04 P/GP) on an apples-to-apples basis to decide who scored at a "better" rate, and to what degree, and use that to make a better-informed drafting decision.
That apples-to-apples comparison between scoring rates in different leagues is one of the core features in our prospect projection model, which also uses height, weight, age, and one new feature that we will discuss shortly, to project the likelihood of a player becoming an NHLer (playing 200+ NHL games) and becoming an NHL star (top 15% of WAR per game among defensemen, or top 20% among forwards).
The first editions of our NHLe and prospect projection models were released in 2021 (link to old methodology here), and we've been posting our prospect projections every year. 5 years later, we're now able to take a good look at what we got wrong and what we got right.
Today, we're announcing four substantial updates to our models based on everything we've learned.
- NHLe is now temporal. Each league has a different equivalency in each season; a KHL point in 2026 is not worth what it was in 2010.
- Prospect projections now incorporate pre-draft NHL Central Scouting Service (CSS) rankings as a feature in the prospect projection model.
- Height and weight now come from CSS rather than Elite Prospects.
- International play is no longer included in either model.
We have made some other changes, which we will dive into shortly. But, if you're asking yourself "Why do these probabilities look so different from what the old model showed?" it is likely due to one of the 4 factors above.
The residuals (what we've learned)
During my time as a data scientist at Snowflake, my boss told me that the most important thing to do when building, evaluating, and maintaining models was to "check the residuals". This has always stuck with me.
Formally, residuals refer to the difference between a projected value and the actual observed value; his advice is to look at cases where this had the largest drift. (So, in this case, residuals would refer to the difference between projected chance of a player making the NHL/becoming a star, and the actual observed values). Informally, his idea is that when you're building and maintaining a model, you should look at where it goes wrong to see how to improve it.
However, it's important to study residuals in context. A decently high residual is, on its own, not a bad thing, while a fairly low residual can still be troublesome.
For a few extreme hypotheticals to illustrate how residuals should be studied under context, let's first imagine that the next Wayne Gretzky were drafted in the 7th round, and we ranked him in the middle of the 1st round with a 10% star probability and 50% NHLer probability. This would be a massive win for us because we would have ranked him far higher than consensus, even if the mathematical residual on the star probability were quite high. Our residual would be far smaller than the residual that anybody else is putting out, despite being quite high.
Now, conversely, imagine that our model gave Connor McDavid an 80% star probability and 90% NHLer probability back in 2015, and ranked him 3rd behind Jack Eichel and Mitch Marner. We still would've been "mostly right" in terms of probabilities but it would've been a pretty glaring miss for the model. Our residual would be far larger than the residual that anybody else is putting out despite being quite low.
One thing we've done in order to put residuals in context for our own evaluation & development purposes is compute the baseline probability for star/NHLer probability based strictly on a player's draft slot using a logistic regression model with a kink at pick 150 to handle the long tail of late picks. This can be found at /draft/baseline. We will be using these numbers (along with actual draft slot vs our ranking) to contextualize our residuals.
Our case studies into residuals were the biggest driving force between all of the model changes we made.
Where we've got it wrong (more so than consensus)
If you've been in the trenches of hockey Twitter, you may have seen some of my old Tweets dug up where I missed hard on prospects. Some examples include Jackson Blake, Wyatt Johnston, Juraj Slafkovsky, and Nikita Artamonov. Let's dig into each of these.
Jackson Blake
On draft day in 2021, Carolina selected Jackson Blake 109th overall. I Tweeted my model's outputs which ranked him somewhere in the 300s and gave him a 0% star probability and 1% NHLer probability. Fast forward to today, he's already closing in on 200 games played and will be a very solid player for the Hurricanes, albeit probably not a star.
In the Tweet, the prospect card shown listed Blake 5'10 at 148 lbs. These numbers came directly from Elite Prospects a few weeks before the draft. CSS listed him at 155 lbs, so it's unlikely that the Elite Prospects numbers were some severe underestimate; at worst they were a few pounds off. The bigger problem with the Elite Prospects numbers was that everybody else's numbers that were fed to the model during training were a severe overestimate (through no fault of theirs, and 100% fault of mine).
Elite Prospects lists the height and weight of players today. It does not list their height and weight at draft day. When I trained the original draft year model (which used NHLe, age, height, and weight), I used NHL equivalency and age that we had up to a player's draft day. But I used the height and weight that we had at the point in time which I scraped that data (2021). So, when I fed "Victor Hedman" to the old draft year model, I fed it his draft year points and age, but his height in 2021. There are two problems with this:
- Miscalibration: The model sees the height and weight that NHL players are today, and assumes they were that weight on draft day. As a result, prospects who meet the "average" value would be those who were the size of the average NHL player on draft day, which are actually quite large prospects.
- Selection bias: NHL players might get their weights updated after they make the show more often than players who never make it or get close.
Using actual draft day heights and weights solve the problem for the draft year model. For the D+n projection models, we use Elite Prospects height and weight. This is justified by our findings that most growth is done after age 19, and that selection bias was not actually as bad as expected; size increases are similar for players who were drafted or made the NHL and players who were not drafted/did not make the NHL. (In 20 years, this will no longer be a problem because we will have actual height/weight measurements at D+1, D+2 to baseline against, but for now our approach is fine).
The updated model would have ranked Blake 86th in his draft, with a 0.7% probability of becoming a star and 4% probability of becoming an NHLer. A typical player drafted in his slot would have a 2% chance of becoming a star and 13.2% chance at becoming an NHLer. So, we would have ranked him a bit higher than he was actually drafted (with the caveat that we would have not ranked many players this draft due to COVID circumstances and insufficient sample sizes; Wyatt Johnston would not have been ranked, as we discuss in his section). But we were lower on Blake's NHLer/star probability than his draft slot alone would have suggested. With the caveat that this was generally considered a weak draft, seems like our updated model is quite reasonable.
Juraj Slafkovsky
On draft day in 2022, Montreal selected Juraj Slafkovsky 1st overall. He was widely regarded, at the time, as one of the weaker first overall selections in recent memory. (2022 as a whole was considered a weak draft, and Slafkovsky was not a consensus #1; there was much debate about who Montreal would take).
Loading tweet...
At the time, I Tweeted my model's outputs for Slafkovsky, which had a strikingly low star probability (2%), NHLer probability (23%), and actually suggested he be taken early in the 2nd round. I explicitly acknowledged in the Tweet that the model underrated him due to his excellent scoring in the Olympics, but that (understandably) has not prevented the dunks now that he has proven himself to be, at the very least, a solid NHLer.
Taking a step back from models and just looking at his scoring rate in the league where he played the most, the inescapable conclusion is that based on Slafkovsky's non-international scoring profile alone, he simply was a long shot to make the NHL. He scored 10 points in 31 Liiga (0.32 P/GP) games in his draft year.
To put that mark into perspective, we'll look at forwards who played at least 5 Liiga games in their draft year between 2005-2006 and 2017-2018 and scored under 0.4 P/GP. There are 26 of them, two of them have played at least 200 NHL games: Joonas Donskoi and Kasperi Kapanen. I am one of the biggest Joonas Donskoi fans that you'll ever meet (I'm a Sharks fan who attended game 3 of the 2016 Stanley Cup Finals in person), but even I must acknowledge that if he is the very best NHL outcome among your age/league/scoring cohort, that is a red flag for a 1st overall pick coming out of said cohort.
This shows that age/league/scoring cohort is insufficient to predict every player's career outcome, and would have misses. We already knew that; the day that I dropped the old model, I noted that it would have "missed" on certain players like Ryan McDonagh, who played high school hockey in his draft year and didn't completely destroy the league points-wise, and then went on to score at low rates in the NCAA and AHL before becoming one of the most underrated two-way blueliners of his era.
But the problem was how we missed with Slafkovsky. Clearly, there was something that just about everybody saw in him that was not reflected in his Liiga scoring rates. In this case, that something special probably could have been quantified by his scoring rates in international play (which I've chosen to exclude for separate reasons, discussed in the Wyatt Johnston section), but the bigger problem was that everybody saw something in him, and the model didn't account for that at all.
One thing to note is that NHLe is descriptive, but prospect projections are predictive. Just as I wouldn't bake e.g. a subjective metric Hart Trophy votes into my descriptive WAR model, I wasn't going to bake Slafkovsky's public perception into the NHLe model which describes how well he scored. But just as it may eventually make sense to one day incorporate betting odds into my predictive NHL game projection model, it makes perfect sense to bake scouting reports into my prospect projection model.
The updated model would have ranked Slafkovsky 5th in his draft, with a 22% probability of becoming a star and 94% probability of becoming an NHLer. A typical player drafted in his slot would have a 79% of becoming a star and 99% chance of becoming an NHLer.
Wyatt Johnston
Dallas selected Wyatt Johnston 23rd overall in 2021. I ranked him in the hundreds, giving him a 1% chance at becoming a star and 4% chance at becoming an NHLer. He is certainly an NHLer and very likely on track to be a star.
Loading tweet...
My ranking was partially due to his weak scoring in his D-1 year (30 points in 53 OHL games), but more so due to his scoring 4 points in 7 games in the WJC-18 in his draft year; this was the only organized hockey he played that year. The WJC-18 is not an especially difficult tournament, and we see many players tear it up.
The "fix" for players like Wyatt Johnston could have been to increase the minimum games threshold to something substantially higher, but the fix actually came higher upstream, in the NHLe model; I excluded international tournament play entirely. The reason for this was not due to Wyatt Johnston, but instead due to general selection bias that I had observed within our NHLe model.
Generally speaking, North American teams in international tournaments tend to be stronger than European teams, which means that star North American players can be relegated to bottom-6 or bottom pair duty, while merely decent European players can be elevated to top line or top pair roles. As a result, when adjusting for player quality, European players would score at substantially higher rates than North Americans in international tournaments. I identified this as a potential issue when building our NHLe model when I noticed the strongest "paths" for the KHL to make the NHL went through the WJC-18 or WJC-20 as an initial connector. The direct transitions between e.g. the KHL and WJC-20 were much more flattering to the KHL than the direct transitions between the WJC-20 and e.g. the OHL.
I had a feeling that my previous model was "too high" on European leagues relative to North American ones, and this appears to have been the reason why. Removing the WJC as a league in the NHLe model entirely, and thus cutting off all connections through it, substantially shrunk the values for European leagues relative to North American ones.
The updated model would not have ranked Johnston. In the same sense that it does not rank goalies, overagers, or injured players who did not play a sufficient sample (still 5 games, we opted to ultimately just not change this), this should be interpreted as "no comment." Not endorsing the selection, not criticizing it either.
Nikita Artamonov
In the 2024 draft, I ranked Nikita Artamonov 3rd behind only Macklin Celebrini and Ivan Demidov, with 85% star probability and 99% NHLer probability. That was miles above where anybody else had him; the Hurricanes ultimately took him 50th overall. It's too early to tell where things are headed for sure with Artamonov, but they don't look good; he just scored only 10 points in 53 KHL games, and was actually demoted to the VHL. Our updated DY+2 model now gives him only a 2% chance to be an NHLer. We don't know for sure if this was a miss, but it looks like it probably was.
Loading tweet...
In fairness to us, Artamonov is a legitimately strange case. He made solid strides after his draft year, scoring 39 points in 63 KHL games. People were digging up my Tweet calling him a steal; one Tweeter said "How's it feel having posted this and seeing Artamonov playing the way he is right now? Talk about hitting the nail on the head. lol" That was quite premature, just like the fans who dunked on me for Brennan Othmann's strong D+1 season in the OHL, and there's a reason I didn't take any victory laps based on it. But it shows he appeared to be heading in a positive direction.
This year, his D+2 year, in the same league, his KHL scoring rate completely fell off a cliff (0.19 P/GP); under a third of what it was in the year prior (0.62) and under half of what it was in his draft year (0.43 P/GP). This isn't the case of a player who came to the NHL and struggled, showing evidence that I overrated his league compared to the NHL; this is a guy who fell off a cliff in his own league. So he probably isn't the best example of me overrating the KHL and getting things wrong, but he is still an important player to drill into.
Regarding Artamonov's draft year, and his very respectable 23 points in 54 KHL games; the same "back of the napkin" check we did for Slafkovsky puts Artamonov in substantially better company. Kirill Kaprizov scored only 8 points in 31 KHL games (0.26 P/GP) in his draft year. Artemi Panarin scored 9 in 20 (0.45 P/GP). Evgeny Kuznetsov scored 8 in 35 games (0.32 P/GP). Nikita Kucherov scored only 2 points in 9 games (0.22 P/GP). There were plenty of players in that age/league/scoring cohort who flamed out, but there was also some very solid company in that bracket.
What makes all of those players different from Artamonov? They played in a totally different KHL. Artamonov played in the KHL after the Russian invasion of Ukraine, which caused many players to migrate away from the KHL, reducing the talent there. As a result, we built a temporal NHLe model, which uses a 4-year rolling window specifically so that by the 2026 year (which the model is being built for, and beyond), will include data starting at the 2022-2023 season, i.e. data after the Russian invasion impacted the KHL's league quality.
Interestingly, our analysis showed that talent in the KHL and most other top European pro leagues had begun declining even before the invasion, but the invasion was what motivated the clear theoretical need for a temporal NHLe model that produces different coefficients in the same year. The final NHLe scoring rates, and hence NHLer/star probabilities, for an Artemi Panarin and a Nikita Artamonov, who scored at essentially the same raw rate in what was technically the "same league" on paper, need to be substantially different, and in our new model, they are.
The updated model would have ranked Artamonov 23rd, with a 5% chance at becoming a star and 53% chance at becoming an NHLer.
Where we've got it right
Unfortunately, it gets dug up less often when the model "hits" and ranks players more accurately than NHL teams and CSS have. Here are a few cases where that happened, and where the updated model would rank them. (Spoiler: The updated model doesn't do as well with the "steals" because it aligns closer with consensus.)
Logan Stankoven
In 2021, Dallas selected Logan Stankoven 47th overall. Our model ranked him 12th, with 16% star probability and 43% NHLer probability.
Loading tweet...
Stankoven was a small forward who played an even smaller sample of games in 2021. Specifically, he played 6 WHL games and scored 10 points, and he played 7 WJC-18 games and scored 8 points. This is quite solid scoring. But he was listed at 5'8 170 lbs, and NHL Central Scouting ranked him 31st among NA skaters alone, indicating he would be a reach in the 1st round.
The updated model would rank him 20th in his draft, with 5% star probability and 50% NHLer probability. (Again, there is the caveat that many 2021 players, like Stankoven's former teammate Wyatt Johnston would have been unranked by us in 2021). A typical player drafted in his slot would have 31% NHLer probability and 5% star probability. So, the updated model would not be as bullish on him as the original, in large part due to his low CSS ranking. This will be a common theme; if CSS doesn't vouch for a guy, the model is probably not going to whole-heartedly do so either.
Lane Hutson
In 2022, Montreal selected Lane Hutson 62nd overall. Our model ranked him 18th, giving him an 8% chance at becoming a star and 44% chance of becoming an NHLer. A typical player drafted at his position would have a 4% chance at becoming a star and 24% chance of becoming an NHLer.
Loading tweet...
Like Stankoven, Hutson was also tiny, listed at 5'8" 148 lbs at the time by Elite Prospects, or else his projection would've been substantially higher.
The updated model would still rank Hutson the same 18th in his draft class, but with a substantially lower chance at becoming a star (4%) and becoming an NHLer (31%). This is the unfortunate truth of including CSS rankings as a model input; the updated model is not going to have quite as much conviction in a player who is poorly ranked by CSS as the original model.
Chase Stillman, Zach Dean, and (probably) Fabian Lysell, Brennan Othmann, and Nolan Allan
All of these players were ranked below 100 in the 2021 draft according to the original model. All of them were drafted in the 1st round.
5 years later, Othmann has played the most games of the pack at 44, and has scored 5 points in those games. Nolan Allan is a defenseman who has played 43 games and scored 8 points, but those games all came on a disastrous 2024-2025 Blackhawks team; he did not play a single NHL game in 2025-2026 despite being mostly healthy and playing for the AHL organizations of the Sharks and Blackhawks, the NHL's two worst teams in the year prior. Lysell has played 12 NHL games and scored 3 points. Zach Dean has played 9 games, 0 points. Chase Stillman has never played an NHL game. Among the players on this list who played in the NHL, every single one of them has a negative career WAR (which was part of the original model criteria). While it's possible that one or even a few of these players end up playing 200 NHL games, it is fairly unlikely, and we're far enough to call this a tentative win.
There are many more players like this for 2022, 2023, and so on, but I don't want to declare anybody "likely bust" yet until they've at least played D+5. There are other players from the 2021 draft who are in "likely bust" territory who went substantially higher than my model ranked them (e.g. Tyler Boucher), but these specific 5 players I've listed are players who the model was extremely bearish on.
The updated model actually would have been kinder to some of these players, but would still have ranked all of them lower than they went with the exception of Zach Dean (who would have been ranked 30, exactly where he was drafted) and Chase Stillman (who would not be ranked since we no longer model Denmark U20 for 2020-2021 season)
Tradeoffs
Ultimately, we have made tradeoffs here. The new model corrects some of the most glaring mistakes that the old one did, but it doesn't stand out quite as much as the old one did in the wins it took. However, bumping down Stankoven (who went 47th) from 12th to 20th, and keeping the rank of Lane Hutson (who went 62nd) stagnant at 18 while slightly reducing his raw probabilities is a worthy tradeoff to avoid our big misses on the Slafkovskys of the world.
Implementation
We will now discuss the methodology in full detail, but keep things concise and keep full "under-the-hood" explanation to a minimum for the sake of brevity. Our previous article describes the construction of the network equivalency model, which uses "direct paths" between one league to another, and then aggregates across numerous multi-hop paths to gain a single coefficient.
For example, when constructing the KHL -> NHL conversion factor, we may use a conversion factor computed from direct transitions from KHL -> AHL, and multiply that by a conversion factor computed from direct transitions from AHL -> NHL, treat the product of these two conversion factors as the "path NHLe" for this specific path from KHL -> AHL -> NHL. There may also be a direct path from KHL -> NHL, as well as a path from KHL -> SHL -> NHL.
We use numerous paths and compute a weighted average of the "path NHLe", weighing them differently based on the number of players who transitioned within each pair of leagues within the path and the number of leagues in between. This is the same general "network NHLe" we used for our previous model, which was inspired by CJ Turtoro. One key detail, adopted from CJ's methodology and discussed in our previous article, is that we only use one "first connector" once. So, if we use the KHL -> AHL -> NHL for the KHL, we can not use any path which contains the direct KHL -> AHL edge again. This is to avoid building a model which places too much weight on one particular transition pattern.
Building the NHL Equivalency model
We made a few small changes to the parameters in our original path-based model, such as using the harmonic mean rather than raw sums to compare leagues and create direct conversion factors, and using all eligible paths rather than cutting off at a certain number. We did not choose to use cross-validation to select model parameters here as all variables showed to produce extremely similar average error in NHLe-predicted points vs observed points in out-of-sample testing and we chose to go with values that were defensible and intuitive.
Biggest change: Y1 -> Y2 transitions
In each of the past 4 seasons, fewer than 25 players played at least 5 games in a non-Russian league and the KHL in the same season. This number was previously over twice as high before the Russian invasion. The KHL is one of the most important leagues to model correctly, so we needed an architecture which would allow us to model the KHL with a sufficient sample size. Thus, we shifted to incorporate Y1 -> Y2 transitions, and use only these.
There are pros and cons to this approach. The biggest pro is that you get a far bigger sample size that includes full bodies of work rather than weird seasons which may have selection bias (players changing leagues mid-season often means they are either too good for their current league, or not good enough).
The biggest con of using Y1 -> Y2 transitions is that by default, you will bake age curves into your model. In cases where all of your transitions are between two pro leagues, this probably isn't a huge deal. But if you're looking at a junior league like the OHL or USHL, you're bound to overrate that league because the vast majority of transitions involving those leagues come where the player plays in a junior league at a young age, undergoes substantial physical development in the offseason and generally improves as a player, and then goes and plays in a higher tier league.
To handle this, we first created a smoothed general age multiplier based on age transitions (within league) across all leagues. We found that from age 18->19, for example, player scoring rates are projected to increase by 29%. From age 25->26 is when the multiplier drops below 1 (i.e. the same player in the same league is expected to score at a slightly higher rate at age 25 than 24, but at a slightly lower rate at age 26 than 25).
Then, when feeding transitions to our model, we do not feed the raw transitions. Rather, we multiply the year 1 scoring rate by the age multiplier for that player's age, and then compare that age-adjusted scoring rate to year 2. This is the way that we chose to try and faithfully make an apples-to-apples comparison and strictly measure what was caused by discrepancies in league quality.
For example, if a player plays in the OHL in his age 18 season and scores 1 point per game, and then plays in the NHL in his age 19 season and scores 0.5 NHL points per game, this will be one direct transition in the OHL->NHL edge, and the pair will be treated as 1.29 OHL P/GP and 0.5 NHL P/GP, with the 1.29 factor being created from multiplying his 1 true observed points per game by the 29% multiplier for the age 18->19 transition.
Temporal model
We have thoroughly discussed the need for a model that produces different coefficients per season. We have touched on the KHL specifically, but there has also been talk that the OHL and other major junior leagues have declined. This was a focal point used to push back against folks who noted that Michael Misa's draft year raw OHL point total had not been matched since the likes of Mitch Marner, Sam Gagner, or Patrick Kane.
We chose to simply fit an unweighted, 4-year rolling window model. The model is fit over 4 year samples, with no particular weighting on different seasons within the sample. For example, the 2025-2026 season is fit on the 2025-2026, 2024-2025, 2023-2024, and 2022-2023 seasons.
Excluding 2013
Not much else to this. 2013 was excluded from the computation of our NHLe model since it was the lockout season where values would be very skewed. This was also done in our original model.
Building the prospect projection model
Defining an NHLer, star, and bust
For defining an NHLer, we removed the "positive WAR" threshold. This is simply because currently, I have the cop out in arguments where I can say "Actually this player didn't ‘disprove’ my model, he has negative WAR!" and I just don't like this being something we need to bring up in an argument.
We decided to re-compute the Pareto Principle for WAR, segmented by forwards and defensemen. We found that 20% of forwards contribute to 80% of forward WAR, while 15% of defensemen contribute to 85% of defense WAR, and thus set the star cutoff to the top 15% of defensemen and top 20% of forwards by career WAR/82GP. This works out to a cutoff of 1.8 WAR/82GP for a star forward and 1.23 WAR/82GP for a star defensemen. The previous baseline was unified around 1.5 WAR/82GP, so the updated model, holding all else equal, will be more bullish on defensemen and more bearish on forwards than the previous edition. This was mainly done for presentation purposes, and also partly because I do acknowledge that my WAR model may be a bit too kind to forwards and too harsh on defensemen, but it should be noted that a star forward is still impactful more than a star defenseman by the objective measures we have.
For labeling any player, we have a decision tree. First and foremost, we can only label players (bust, star, or NHLer) if they were drafted in 2021 or before. (This helps prevent class imbalance that would otherwise arise if we fed e.g. star Macklin Celebrini to the model but not any busts from 2024).
For defining a bust, we have a decision tree. If a player has not played 200 NHL games yet, they are a bust, unless they have played at least 50 NHL games so far, played at least 10 games in the 2025-2026 season, and their draft year is greater than or equal than 2016.
Including CSS rank in the model
We include CSS rank in the model as a variable. We use the natural logarithm of rank to emphasize that the gap between e.g. 1 and 2 is much bigger than the gap between e.g. 100 and 101, and we include "EU" as a separate binary variable encoding whether the player was ranked by CSS among NA skaters or EU skaters.
I would have much preferred to incorporate e.g. Bob McKenzie's rankings or any other source which does not have the awkward NA/EU split, but unfortunately CSS is the only source for which we reliably have data going back as far as 2008.
General model design
The model is a logistic regression model. This is good for our case where all of the variables included (height, age, weight, NHLe, rank) are all expected to change probability in one direction or the other as they change.
Evaluation metrics (Draft-Year Model)
To evaluate the model, we use log loss and AUC, the same evaluation metrics we used before, and conducted evaluation out of sample (training the model on every season outside of the one it is fit on). This time, though, we chose to only evaluate the model against players who were ranked by CSS. This decision is made so that we do not "juice" the values by saying "we basically predicted a bunch of busts, and that is what we saw, so the model must be right."
We then tested across 3 separate models, all of which include height, weight, and age. The first uses only CSS rank (and a binary variable to encode whether the rank is across EU or NA skaters) in addition to size/age. The second uses only NHLe in addition to size/age. And the third uses NHLe AND rank. The results are below:
| Target | Position | Metric | NHLe + Rank | NHLe only | Rank only |
|---|---|---|---|---|---|
| NHLer | Forward | AUC | 0.8599 | 0.8292 | 0.8493 |
| NHLer | Forward | Log loss | 0.3124 | 0.3558 | 0.3289 |
| Star | Forward | AUC | 0.9128 | 0.8882 | 0.8922 |
| Star | Forward | Log loss | 0.1199 | 0.1272 | 0.1260 |
| NHLer | Defense | AUC | 0.8578 | 0.8033 | 0.8481 |
| NHLer | Defense | Log loss | 0.2741 | 0.3337 | 0.2780 |
| Star | Defense | AUC | 0.9046 | 0.8700 | 0.8846 |
| Star | Defense | Log loss | 0.0852 | 0.0980 | 0.0868 |
Across every evaluation metric, NHLe on its own is beat by draft rank on its own. There is some selection bias at play here (players who are ranked higher are considered better and therefore will get more opportunities), so this may be slightly unfair to the base NHLe model, but it is still noteworthy. And supports the argument that NHLe without context should not be used.
More imporantly, though acrosss every evaluation metric, the combination of NHLe + CSS rank is the top performer. This tells us where pure NHLe models are at: Good enough to give a strong signal, but still beaten by the signal provided by entire CSS scouting. However, this particular NHLe model, when used in conjunction with CSS scouting rankings, is better than those rankings alone for predicting NHL players and stars.
Conclusion
Humans alone still beat math alone, as it turns out. But we've learned from our previous mistakes and greatly improved our math, enough so that our "centaur" model, informed by both human evaluation AND math, beats math or human evaluation on their own.