I just returned home from a week-long summer program called the Wharton Moneyball Academy at the University of Pennsylvania. The course was focused on statistics as applied to sports, with baseball being the main topic. For my final project, I decided to apply a well-known baseball formula to hockey to see how it would work and I figured I would share my findings with you. I used RStudio to collect and gather my information and then I graphed it in LoggerPro. Read below for more information.
Bill James is an extremely well known baseball writer and statistician. He has written dozens of books about baseball statistics and one of those many books contained a formula titled the pythagorean expected win formula. The formula is:
Essentially, this means that if you plug the correct values into this formula, you will find the percentage of games a team should have won and it should match up to the actual standings. I decided to adapt this to hockey by exchanging runs scored for goals scored and runs allowed for goals allowed. I applied this formula to all 30 teams for 5 different seasons, starting with 2010-2011. For time’s sake (and so I don’t bore you all to death), I will only show you some of my graphs and findings.
Correlation For 2014-2015
As you can see, the correlation is very strong at a whopping 99%, and the findings from each other year were extremely similar. This formula works, people. Oh, and just for fun, if you look at the point in the very bottom left corner, that would be the Buffalo Sabres. Also, the point to the right of those three points that seem to be on top of one another is the Panthers.
Next, I decided to recalculate what the standings would be if they were based on the Bill James percentage by ordering the results from highest to lowest. I plotted that against the actual standings from the end of the season to see what I would find.
So once again, the correlation is pretty strong at about 91%. The one distinct outlier is Anaheim, a team that ended the season ranked #3. Yet, the team’s ranking according to Bill James would be at #17. This may be due to the fact that most of the top teams scored way more goals than they allowed but the Anaheim ducks scored 236 goals and allowed 226, leaving a difference of only 10 points. On the other side of things, Calgary was actually ranked #16 in the league but Bill James would have the team at spot #8. This is because the team scored 241 goals and only let in 216. The Panthers are the team right on the line where x=20.
From Year to Year
To take this test one step further, I plotted the Bill James Percentage from the 2013-2014 season against the one from the 2014-2015 season.
There is still a 56% correlation between the two, but the correlation is not nearly as strong as it had been for my last two graphs. This is because with the exception of some teams who remain particularly good (i.e. St. Louis Blues) or particularly bad (i.e. Buffalo Sabres), most teams will move around a bit in the standings from year to year as they improve or worsen. New players come and old players go, shaping a team into a relatively different one from the year before. Probably the best case study is your favorite team and mine, the Florida Panthers. With a 25 point increase from last season to this one, the Panthers are a clear outlier on the graph. You can find them to the left of the graph around the middle of the y-axis. The Coyotes are the team at the bottom middle of the graph, because they did a lot worse this year than they did last year.
I did the same for actual win percentage and found that the results were similar:
In fact, the correlation is actually weaker on this graph. Once again, The Panthers are the main outlier and you can find them in relatively the same spot that they were on the previous graph. The amount of games a team wins one season usually does not predict the amount of games a team wins in the next season.
What Does This Mean For The Panthers?
Well, the good news is that clearly, a team’s performance one year does not strongly correlate with the team’s performance the next year. The Panthers are clearly on the rise, while other teams are sinking. Yet, it is true that the ratio between goals scored and goals allowed is extremely important. For a team to be successful, it needs to have both good defense and good offense. Teams like the Toronto Maple Leafs or the Philadelphia Flyers had relatively decent offense but very poor defense. The Panthers are a well-rounded team that leans more toward defense than to offense but if they can get that fixed next season (which I think is a good possibility), the team will be a real force to be reckoned with. Playoffs, here we come!