NHL RAPM (Regularized Adjusted Plus-Minus) Explained

As I discussed in my Wins Above Replacement (WAR) write-up, our WAR model evaluates players across six components. I use RAPM (regression) to isolate individual impact for four of the six WAR components:

Even Strength Offense

Even Strength Defense

Power Play Offense

Penalty Kill Defense

PenaltiesNo regression

FinishingNo regression

The regression isolates a player's impact by accounting for various external factors that surround them. These factors differ depending on the component which I am evaluating.

1Even Strength Factors

For even strength offense and defense, I account for the following components:

All teammates and opponents

Every skater on the ice during each shift is included in the model.

Power Play Expiry Shifts

Whether a shift started on-the-fly as the result of an expired power play. This is by far the most important piece of external context that can shape a player's result for a given shift. Ignoring this context is extremely unfair to penalty killers like Esa Lindell who start a large percentage of their shifts as the result of expiring enemy power plays, where "power play influence" is still present.

Zone Starts

Whether a shift started with a faceoff in the offensive, defensive, neutral zone, or on-the-fly.

Score of the game

Current score differential affects how teams play and their results.

Home ice advantage

Which team is at home.

Number of skaters

The number of skaters on the ice for each team.

Back-to-backs

Which teams are playing the second halves of back-to-backs.

2Power Play & Penalty Kill Factors

For power play offense and penalty kill defense, I account for the exact same components as even strength, except instead of a variable denoting whether a shift started as the result of an expired power play, I use a variable denoting whether that power play began on-the-fly as the result of an expired penalty where the previous game state was even strength.

Building the RAPM Dataframe

It's very easy for me to say "I account for all of these factors." It would also be easy for me to say "Once you account for camera angles, I'm better looking than Brad Pitt." Both of these would rightfully be met with a heavy degree of skepticism from those who hear them, which is why I've decided to be as transparent as possible about my process of isolating skater impact.

I begin the process by building a dataframe which contains one row for each shift and one column denoting the presence of every skater and contextual factor as a dummy variable with a value of one or zero. I define "shifts" as all instances of play where the skaters on the ice, goaltenders, score, and period do not change. When any one of these variables changes, a new shift begins.

Here's a partial view of what this dataframe for constructing even strength RAPM looks like:

Partial view of RAPM dataframe structure — Partial view of the RAPM dataframe — roughly 600,000 rows and 1,800 variables per season

This is a massive dataframe that holds roughly 600,000 rows and 1,800 variables depending on the season. In this example, not a single one of the visible dummy variables are present.

A Single Shift Example

To give a closer example, let's look at just one shift that occurred in the third period of the second game of the 2018–2019 season. Here's what you need to know about this shift:

•The Washington Capitals played at home against the Boston Bruins.
•The Washington Capitals held a lead of three or more goals.
•This shift lasted 24 seconds.
•This shift began with a faceoff in Boston's offensive zone.
•The Bruins took an unblocked shot with a 10.6% probability of scoring.
•The Capitals took an unblocked shot with a 2.08% probability of scoring.
•Boston's five skaters: Brandon Carlo, Joakim Nordstrom, Chris Wagner, Noel Acciari, and Zdeno Chara.
•Washington's skaters: Andre Burakovsky, Brooks Orpik, Chandler Stephenson, Lars Eller, and Madison Bowey.

If we built a dataframe for RAPM using only this shift, here is what it would look like:

RAPM dataframe for a single shift — two rows (one for each team's perspective)

As you can see, there are two rows for one shift. The top row is from the perspective of the home team and the bottom row is from the perspective of the away team. If we move across to the end of the dataframe, we can see the skaters marked with ones and zeroes for defense:

Skaters marked with ones and zeroes for defense — Defensive dummy variables — skaters marked with 1s and 0s

Why Regularization is Essential

The regression is run using every shift from the entire season. xGF/60 serves as the target variable and every other variable in the dataframe serves as a predictor variable. This regression, however, is not a typical linear regression; rather, it is a weighted ridge regression, where the length of each shift is used as a weight, and the "ridge" serves to shrink coefficients down to zero.

Why do we need this ridge, which shrinks coefficients to zero? Because if we do not have it, we will have wildly inaccurate values. Here is a snippet of what RAPM results look like for the 2018–2019 season if I were to run a standard weighted linear regression without the ridge:

RAPM results without ridge regularization - completely unusable — A complete mess — APM without regularization produces wildly inaccurate values

This is purely adjusted plus-minus without the regularization. This is what NBA analysts used for some time before they discovered the magic of regularization. As you can see, the results are a bit of a mess; the top players are all nobodies who probably had strong on-ice numbers in very limited minutes, and the impacts are all far too high — not even Wayne Gretzky in his prime would improve his team's hourly expected goal differential by over three. I do not believe anything on planet earth is more effective than the above chart at displaying the need for regularization.

L2 (Ridge) Regularization

With regularization, we implement a "penalty" to the dataset which minimizes the mean-squared error by shrinking coefficients towards zero. The method of regularization I've chosen is L2 (Tikhonov) Regularization, where every coefficient is shrunk towards zero, but never to exactly zero, since I want coefficient estimates for each player. L1 (Lasso) Regularization would not work here since it would drop some variables entirely.

The penalty that I use comes in the form of a lambda value, which I obtain through cross validation. This plot shows us where the optimal lambda value is:

Cross-validation to find optimal lambda value — Cross-validation plot for selecting the optimal lambda (regularization strength)

Once cross validation is run and the optimal lambda value is obtained, we run the regression once again using this lambda value as a penalty for every skater in the data set. The results, run on the same exact data set, look far more intuitive:

Vanilla RAPM final results - much more reasonable values — Regularized RAPM results — reasonable values that pass the eye test

Prior-Informed RAPM

However, I still felt that the outputs were generally a tad too low, and that there was too much variation in year-to-year player impact. I did some research on NBA RAPM and found that this was not an uncommon sentiment; many basketball analysts also felt that one year of RAPM was somewhat unreliable, which is why they generally incorporate prior knowledge into their regressions.

I actually made an account on an NBA analytics forum and spoke with Daniel Myers, the creator of the NBA's Box Plus-Minus, who helped guide me through the process of creating prior-informed RAPM.

How Prior-Informed RAPM Works

First, prior information for each shift is subtracted from the target variable. For example, if Connor McDavid is on offense and my prior knowledge tells us that his xGF/60 should be 0.5, I subtract 0.5 from the observed xGF/60 for that shift. Then I run the full regression — first obtaining lambda values through cross-validation and then running the actual regression. After this is complete, I add his prior value back to his observed value from the regression.

Calculating the Prior

I initially tested using a player's full estimated impact from the prior season as a prior and found that on a year-to-year basis, player impact estimates went from being too malleable to not being malleable enough. It was practically impossible for players who had a strong impact in year one to have a poor impact in year three or vice versa.

I tested some things out and ultimately found a compromise between incorporating no prior information and too much prior information by calculating the linear trend between vanilla RAPM in year one and year two, and then using that trend to calculate a "predicted RAPM" which was used as a player's proper prior.

Linear Trend Examples

I calculated linear trends for both offense and defense and for both forwards and defensemen, as these components are not all equally repeatable:

Forward Offense

0.008151 + (Prior) × 0.446297

Forward Defense

-0.003181 + (Prior) × 0.280373

In other words, forward defense is less repeatable than offense. So if a forward's RAPM xGF/60 and xGA/60 in year one were both 0.5, then our hypothetical high-event forward's priors for year two would be 0.231 xGF/60 and 0.137 xGA/60.

The Daisy Chain Method

I calculated prior-informed RAPM using a method known as a "daisy chain" where I chained a player's impact in each season to their impact in the next season. For example, if a hypothetical forward had an xGF/60 of 0.5 in 2013–2014, I would use that to calculate their prior for 2014–2015. Then, I would use the prior-informed RAPM from 2014–2015 to calculate the prior for 2015–2016, and so on and so forth through 2019–2020. I began the process in 2007-2008.

Source Code & Transparency

I have made source code for calculating vanilla even strength RAPM for the 2018–2019 season available on my Github.

I wish to be as transparent and honest about this process as possible, so if there is anything that you do not understand or any details that you are uneasy about, please do not hesitate to reach out to me on Twitter. If anything here is a black box, then that is a failure on my behalf, as none of it should be.