Advanced stats/ Moneypuck

Welcome to the main forum of our site. Anything and everything to do with the Vancouver Canucks is dicussed and debated here.

Moderator: Referees

ESQ
MVP
MVP
Posts: 4477
Joined: Wed Jul 13, 2011 6:34 pm

Re: Advanced stats/ Moneypuck

Post by ESQ »

Topper wrote:First is data verification. ...
Second, look at correlation coefficients to identify data linkages. ...

Another problem In the case of the chances for/against model, the obvious problem was the home plate area for defining chances.
Very interesting Topper, you obviously have a far higher-level understanding of stats than myself.

It seems to me in reading your post that your main problem with Charron's system is the first point - data verification. Without knowing that, you can't calculate the standard deviation, variability, etc. and so you have no way to judge how effective or reliable the numbers are. Is that correct? Same with not knowing the criteria for tracking - if all shots are treated equal, then the usefulness of the stat is diminished.

I have to admit I don't understand what a correlation coefficient is :oops:

But do you agree that these stats, even without the underlying raw data, are still more useful than "traditional" stats in assessing a player's value? And is there a publicly available "advanced stat" that meets the criteria for reliability?
User avatar
Topper
CC Legend
Posts: 18097
Joined: Wed Sep 15, 2004 8:11 pm
Location: Earth, most days.

Re: Advanced stats/ Moneypuck

Post by Topper »

dbr, there are several data validation issues with Cam Charon's work.

Obviously his arbitrary home plate area causes it's own biases. A point shot is not a scoring chance, neither is a shot banked off a defender or goalie from behind the net.

If Raymond had bobbled Henrik's pass on his goal last night instead of getting a shot off and scoring, would that have been a scoring chance?

Next is the wild variation in the chances for and chances against on a game to game basis. I noted at the time they were originally posted that one game was 3 chances for for the Sedins, next game was 11 chances for. This is where a simple T-test is useful to look at the variability of the data.

It isn't much different than many of us arbitrarily waiting until the 20 game mark (roughly the 1/4 pole of the season) before posting assessments of player performances. You need a meaningful sample size to iron out some of the variability. Stats does offer up some simple ways to quantify that variability and tell us if we have a sufficient sample size to iron out that variability.

If the data is properly validated and any biases noted and results taken in context of those biases some of these methods can be very useful.
Over the Internet, you can pretend to be anyone or anything.

I'm amazed that so many people choose to be complete twats.
User avatar
Chef Boi RD
MVP
MVP
Posts: 28881
Joined: Mon Aug 09, 2004 6:36 pm
Location: Vancouver

Re: Advanced stats/ Moneypuck

Post by Chef Boi RD »

“A single death is a tragedy; a million deaths is a statistic.” - Stalin
“Tyler Myers is my guy... I was taking to Scotty Bowman last night and he was bringing up his name, and saying he’s a big guy and big guy need big minutes to play, he is playing great for ya… and I agree with him… He’s been exceptional” - Bruce Boudreau
Larry Goodenough
CC 1st Team All-Star
Posts: 728
Joined: Wed Oct 19, 2005 10:43 am

Re: Advanced stats/ Moneypuck

Post by Larry Goodenough »

Topper wrote:dbr, there are several data validation issues with Cam Charon's work.

Obviously his arbitrary home plate area causes it's own biases. A point shot is not a scoring chance, neither is a shot banked off a defender or goalie from behind the net.

If Raymond had bobbled Henrik's pass on his goal last night instead of getting a shot off and scoring, would that have been a scoring chance?

Next is the wild variation in the chances for and chances against on a game to game basis. I noted at the time they were originally posted that one game was 3 chances for for the Sedins, next game was 11 chances for. This is where a simple T-test is useful to look at the variability of the data.

It isn't much different than many of us arbitrarily waiting until the 20 game mark (roughly the 1/4 pole of the season) before posting assessments of player performances. You need a meaningful sample size to iron out some of the variability. Stats does offer up some simple ways to quantify that variability and tell us if we have a sufficient sample size to iron out that variability.

If the data is properly validated and any biases noted and results taken in context of those biases some of these methods can be very useful.
Well, I'm still trying to figure out alot of the issues you suggest are out there.

It's not just Drance/Charron, but a network of bloggers who form the "nation" network who use the same similar criteria to track scoring chances. I think it was mentioned upwards of 13 teams are being tracked using the same basic system (home plate). However, Drance/Charron state it's not just shots from within homeplate that count as chances, but they will count a screen shot from the point that has potential to go in or a shot outside of homeplate that involved significant movement of the puck before the shot. They're pretty open about the subjectivity of it. However, I did see one chance blog when they played the Jets. Charron and the Jets blogger both tracked chances and the difference in their counts was only 1 or 2.

As for the other data used, they take from behindthenet or arctichockey. I believe that info is taken from the NHL.com game tracking. The biases of home team scorers is often recognized, but is often thought to come out in the wash over a larger sample.
User avatar
Topper
CC Legend
Posts: 18097
Joined: Wed Sep 15, 2004 8:11 pm
Location: Earth, most days.

Re: Advanced stats/ Moneypuck

Post by Topper »

Larry Goodenough wrote: They're pretty open about the subjectivity of it.
That subjectivity is a huge problem. There needs to be strict guidelines on data collection. There is already too much subjectivity in the league collected stats. Not too bad when it comes to shots, but when it comes to stats like hits, it is widely accepted that some rinks have homer advantages in their off ice officials.
Larry Goodenough wrote:As for the other data used, they take from behindthenet or arctichockey. I believe that info is taken from the NHL.com game tracking. The biases of home team scorers is often recognized, but is often thought to come out in the wash over a larger sample
There is the basis for much of the argument. Sample size. Back when this was first brought up. I did a quick and dirty, back of the envelope figure that it would take at least 2/3rds to 3/4s of a season before the outliers where massaged out.

That is why data verification, like a T-test, is important.
Over the Internet, you can pretend to be anyone or anything.

I'm amazed that so many people choose to be complete twats.
wienerdog
CC 2nd Team All-Star
Posts: 371
Joined: Fri Jul 15, 2011 4:47 pm

Re: Advanced stats/ Moneypuck

Post by wienerdog »

Topper wrote:
Larry Goodenough wrote: They're pretty open about the subjectivity of it.
That subjectivity is a huge problem. There needs to be strict guidelines on data collection. There is already too much subjectivity in the league collected stats. Not too bad when it comes to shots, but when it comes to stats like hits, it is widely accepted that some rinks have homer advantages in their off ice officials.
Is that subjectivity that huge a problem though?

Truly objective data collection in more dynamic sports like hockey or soccer is known to be more difficult than in a sport like baseball, where the play has a more structured execution.

The pure objectivity of stats would be of paramount importance for sports geeks wanting to talk comparative analysis of teams across the NHL, but for a GM more interested in the the performances of his specific players, it's certainly less critical.

A wise GM would acknowledge the fact that a certain measure of subjectivity will influence the data set. Call it the "grain of salt" component to the numbers. As you already pointed out, tops, large sample sets are needed to reflect accurate trends regardless - the subjectivity of certain stat collectors would also tend to "wash out" over those bigger data pools.

A wiser GM with some serious resources would go farther than that.

More specifically, if I were Gillis, I wouldn't rely on NHL collected data anyway. As far as our own team is concerned, I'd only be interested in data sets collected by my own people, and I'd even want those sets customized into sub-sets of more specified pools of information.

For example, what is considered a trackable scoring chance by a Sedin banking a puck in off the back of a goaltender - a play that both brothers have executed before - might not register as a bona-fide "chance" when it's a fluky out-of-the-ordinary bounce committed by a fourth liner like Weise. That is to say, a stat system that could actually recognize that a Sedin is looking for that play while a Dale Weise isn't would be even more valuable.

Call it a filtered pre-interpretation of the numbers before they even get to Gillis' desk.

Hey, in an ideal world, you have a different stats geek tracking every single different player on the ice. Even better if each one was autistic and therefore was unlikely to form an emotional attachment to his data target.

Love to be a fly on the wall in that hiring process. :drink:
User avatar
Topper
CC Legend
Posts: 18097
Joined: Wed Sep 15, 2004 8:11 pm
Location: Earth, most days.

Re: Advanced stats/ Moneypuck

Post by Topper »

Sample size will not necessarily "wash out" subjective data. To suggest it does also suggests that the subjective data is irrelevent and if that is the case, why bother with it.

As I have said, statistics has several methods for testing the relevance of collected data. These folks should use those methods.

They should also remember the principles of KISS and not overly complicate their stats. If you look at Bill James' work in baseball, he does not go out of his way to complicate his work and therein lies the elegance of his work.

With the GIS model I suggested, I would include a ranking system of scoring chances, but the ranking system would be guided by strict examples thus limiting the sampler bias when collecting data. It appears by LG comments, Cam Charron has a much more liberal practice. As LG noted, a difference of 1 or two chances in a game between samplers. That could easily be a difference of 10-20% or more.Hardly significant. EDIT Hardly insingnificant

Like I said, the GIS model I suggested could be set up for a few 10's of thousands of dollars and once done could be given to pro and amateur scouting staff. With a few hours training and some video work, consistency between samplers could be brought about. If a team was really keen, they could have a small handfull of data input monkeys watching video of each game and inputting data. Combine that with league TOI and game summary data and you are off to the races.

I am all for the use of advanced statistics, however unless the data is collected properly and validated, folks are just fooling themselves.
Last edited by Topper on Mon Mar 26, 2012 11:16 pm, edited 1 time in total.
Over the Internet, you can pretend to be anyone or anything.

I'm amazed that so many people choose to be complete twats.
wienerdog
CC 2nd Team All-Star
Posts: 371
Joined: Fri Jul 15, 2011 4:47 pm

Re: Advanced stats/ Moneypuck

Post by wienerdog »

Topper wrote:Sample size will not necessarily "wash out" subjective data. To suggest it does also suggests that the subjective data is irrelevent and if that is the case, why bother with it.

As I have said, statistics has several methods for testing the relevance of collected data. These folks should use those methods.

They should also remember the principles of KISS and not overly complicate their stats. If you look at Bill James' work in baseball, he does not go out of his way to complicate his work and therein lies the elegance of his work.

With the GIS model I suggested, I would include a ranking system of scoring chances, but the ranking system would be guided by strict examples thus limiting the sampler bias when collecting data. It appears by LG comments, Cam Charron has a much more liberal practice. As LG noted, a difference of 1 or two chances in a game between samplers. That could easily be a difference of 10-20% or more. Hardly significant.

Like I said, the GIS model I suggested could be set up for a few 10's of thousands of dollars and once done could be given to pro and amateur scouting staff. With a few hours training and some video work, consistency between samplers could be brought about. If a team was really keen, they could have a small handfull of data input monkeys watching video of each game and inputting data. Combine that with league TOI and game summary data and you are off to the races.

I am all for the use of advanced statistics, however unless the data is collected properly and validated, folks are just fooling themselves.
Seems like we're pretty much on the same page.

I put "wash out" in quotes, because when one is using sets that are obviously so subjective (League stats), one has to engage in some kind of (further subjective) averaging mechanisms if you're going to rely on that type of data collection. I agreed that wasn't the way for franchises to go with advanced stats - that's what the Charron's of the world are using because that is the resource that is available to them.

Instead, you are suggesting what I would also see as the best solution - employ one's own "data divsion", literally a few guys that are charged with tracking the relevant data of every single game. You're limiting any subjectivity to only that which might be intrinsic to the data stream that your own guys are collecting. The only weaknesses I see to this system are

1) cost (not an issue for Aquaman - well, pre-divorce at least...), and
2) security (if you want to leave the Organization, data monkey, it's the Long Dirt Nap for you). :mex:

As for Bill James vs hockey stats: I'm all for simplicity and elegance, and the sooner any pool of information can cut to the bone the more effective it will prove to be. All I have to offer up is what I said before: to me, baseball and hockey are wildly different animals on a visceral level, let alone a statistical one. It would seem pure common sense to me that while one statistic genius could engineer an elegant and model of data collection and analysis for baseball, that same genius could tie himself into knots trying to apply that same simple elegance to a beast like ice hockey. For hockey, I have a hard time picturing a data set that is relevant yet not complex or complicated - it seems to be inherent to the nature of the game. The interpretation of that that set is a different matter entirely.

Having said all of this, I'm not even anything close to approaching a "stats guy". I have an intense love of the strategy used to build a winning team (maybe even more so than the tactics on the ice), and thus these discussions are like candy to me. I love this shit even if I don't understand all of the nuances in it.
User avatar
Topper
CC Legend
Posts: 18097
Joined: Wed Sep 15, 2004 8:11 pm
Location: Earth, most days.

Re: Advanced stats/ Moneypuck

Post by Topper »

Maybe I should add for completeness. I don't see much of a problem dealing with league collected stats. Yes they will vary by homerism in some rinks, but that homerism should be easily spotted in the data and the bias should be able to be accounted for. A process of data leveling shouldn't be too difficult to engineer.

Also, when I talk of the elegance of Bill James' work, I have to implore that he did not over work the data. He did not get into complex calculations comparing variables to come up with a new measure. that is what I mean by Keep it simple, stupid.

Cost, as I pointed out is not an issue. A few tens of thousands of dollars in a scouting program making desicions worth several millions of dollars is not an exorbitant investment.

Security is moot, as other teams will quickly adopt similar systems given the relative low cost of entry.
Over the Internet, you can pretend to be anyone or anything.

I'm amazed that so many people choose to be complete twats.
Vpete
CC 2nd Team All-Star
Posts: 320
Joined: Sun Jul 10, 2011 3:01 pm

Re: Advanced stats/ Moneypuck

Post by Vpete »

Topper you have hit the nail so hard it's gone through the board.

While Charon, Drance, Desjardins and Johnson, among many others, use stats and techniques to quantify what they believe there is a whole other alchemy involved in Advanced Stats.

First off some of them are incredibly arrogant and use their work to support outcomes they predicted long ago- I have yet to see one say they were off. I guess that's a privilege of being one of the first 'out' on this front. They are also very possessive and argumentative in regards to challenges of their work, Tyler Dellow and Desjardins being the leaders by far.

So what makes it so special? Well start with Corsi- it's a way of looking at the game and drawing some simple conclusions about what a player does on the ice but does it tell a more complete story than observation or does the information support the observations? Almost a chicken and the egg scenario.

Look at Desjardins' site and look at Quality of Competition- what determines it?
There are numerous ways we could go about this. We could average out the points-per-game of opposing players(as Jonathan Willis has suggested, and which works reasonably well, particularly when you have no ice time information), but I think the best place to start is with what I called "Relative +/-" or "Rating" in part 1. Relative +/- adjusts a player's on-ice +/- relative to his team's +/- while he was off the ice. In general, it corrects for the boost players get from playing on a good offensive team and vice-versa.

If we average that rating across all of a player's opponents, weighting for how much time they played against one another, then we have an estimate of how good a player's opponents were relative to their teams. In a general sense, first line players have the best ratings, so players who play against the first line should see the highest opponent rating. That average opponent rating is the "Quality of Competition" faced by a given player.
Ask them what they think of using plus/minus as a stat and then see that it's completely valid for them in this scenario. Of the top 30 scorers in the NHL 9 are in the minus, almost 30% So how does Desjardins 'adjust' the +/- as he says above?
We can make a small improvement on +/- by subtracting the +/- when a player is off the ice from it. That is, if a player was +1 goal per 60 minutes on the ice and his team was even when he was off, he ends up appearing the same as a guy who was even on the ice while his teammates were -1 per 60 minutes. It's not perfect, but it does make an adjustment for how good a player's teammates were. This statistic has several names - relative +/-, On-Ice/Off-Ice +/-, or simply "Rating", as I've called it on the stats page.
Here's the list: http://www.behindthenet.ca/2008/new_5_o ... team=&pos=

Where I have a problem with all this is hockey is a game with unique variables unlike other sports. If Datsyuk is a +/- there is no way to determine how he got there by Desjardins' stats or anyone else's. In his list it shows Rene Bourque as 3rd for 'rating'. How many turnovers does Datsyuk create? How many points from turnovers does he earn? How often did his team mate convert on his work? How many scoring chances does he create? What makes Datsyuk one of the best players in the game is how he handles the variables within the game and what he does and none of those stats tell you how- they only support that he does but also Rene Bourque does too.

Look at stats like shots attempted. Wouldn't it be nice to know how often a team recovers missed shots in their defensive zone and in the offensive zone? If you are a puck possession team like the Canucks it's pretty important especially if you adhere to strict zone start regimens.

I think the advanced stats are incredibly interesting and tell a good and relevant story for teams about players and past performance but I believe there are betters ways to do it. The Canucks have their own team who do advanced stats and it sure as hell would be interesting to know what they track as I bet it's far different than what we see.
Brick Top: Do you know what "nemesis" means? A righteous infliction of retribution manifested by an appropriate agent. Personified in this case by an 'orrible cunt... me.
User avatar
Topper
CC Legend
Posts: 18097
Joined: Wed Sep 15, 2004 8:11 pm
Location: Earth, most days.

Re: Advanced stats/ Moneypuck

Post by Topper »

Vpete wrote:First off some of them are incredibly arrogant and use their work to support outcomes they predicted....

....... Almost a chicken and the egg scenario.
Pretty much what drew me to posting this.

Rather than combing the stats for patterns and letting the numbers tell a story and best of all using them to project performance, some of these folks come up with a story of a players performance and then concoct a stat calculation to support their story.

That is where "Lies, Damn Lies and Statistics" comes from.

Using some calculated value to justify why a team or player is where he is, is silly grandstanding.

The value is in combing the data, spotting trends and using those trends to project a players performance.
Over the Internet, you can pretend to be anyone or anything.

I'm amazed that so many people choose to be complete twats.
ESQ
MVP
MVP
Posts: 4477
Joined: Wed Jul 13, 2011 6:34 pm

Re: Advanced stats/ Moneypuck

Post by ESQ »

Vpete wrote:First off some of them are incredibly arrogant and use their work to support outcomes they predicted....

....... Almost a chicken and the egg scenario.
Topper wrote: The value is in combing the data, spotting trends and using those trends to project a players performance.
That is something that seems to be missing. Every guy posts an article ex post facto legitimizing their predictions, but I haven't seen them publish their predictions. Like the guy who said the leaves fall proves his shooting % analysis - their shooting % was way too high when they were winning, it was unsustainable, now that their shooting % isn't as high they're not winning. That's all well and good, but did you publish that prediction when they were winning?

If anybody finds a good author who is making reasonably accurate predictions, I'd love to see that.
User avatar
Rayxor
CC 2nd Team All-Star
Posts: 387
Joined: Mon Jul 11, 2011 8:05 am
Location: Tiger country

Re: Advanced stats/ Moneypuck

Post by Rayxor »

ESQ wrote: If anybody finds a good author who is making reasonably accurate predictions, I'd love to see that.
He would be the guy that wins his hockey pool every year. If anyone does have some analysis that is an accurate predictor, I'm sure he is keeping quiet about it and earning a living in sports betting.
dbr
CC Legend
Posts: 3093
Joined: Sat Jul 09, 2011 5:37 pm

Re: Advanced stats/ Moneypuck

Post by dbr »

I think a lot of people predicted that the Minnesota Wild were due for a serious correction back when they were 1st overall in December (heck I left a comment on ProHockeyTalk that was.. not received well), although maybe not the kind of spectacular fall we've seen since.

Same with the Edmonton Oilers when they were getting .950 goaltending in the first couple weeks of the season.

But both of those predictions are based a bit more on shot attempts, shooting and save percentages rather than scoring chance data. I don't know who out there is using that to predict the final standings before the season starts, although I guess it is one way to try to work out which players and teams are 'better' than the results they got previously. Trouble is, lots of players and teams remain 'better' than their results for a long, long time.
User avatar
Topper
CC Legend
Posts: 18097
Joined: Wed Sep 15, 2004 8:11 pm
Location: Earth, most days.

Re: Advanced stats/ Moneypuck

Post by Topper »

I recently read a series of articles/primers by Blake Murphy at nucks miconduct where he looks at performances after 4 games using advanced statstics.

He does so in very coached terms and echos many of the issues I noted last year.

1) advanced stats are a supplement to watching games, not a replacement and this is especially true with a small sample size. This isn't a pitcher-batter situation, hockey is far more dynamic.

2) in a 48 game season all sample sizes will likely be too small. I noted last year that a quick and dirty calculation suggested 2/3rds to 3/4ers of a season may reduce the influence of outliers.

3) when looking at results from a small sample size it is not possible to compare players but is best too look at trends of performance.

Maybe someone was listening last year. I'll keep reading Blake's work and see where he goes with it.
Over the Internet, you can pretend to be anyone or anything.

I'm amazed that so many people choose to be complete twats.
Post Reply