Advanced stats/ Moneypuck

Welcome to the main forum of our site. Anything and everything to do with the Vancouver Canucks is dicussed and debated here.

Moderator: Referees

Re: Advanced stats/ Moneypuck

Postby ESQ » Fri Mar 23, 2012 3:27 pm

Topper wrote:First is data verification. ...
Second, look at correlation coefficients to identify data linkages. ...

Another problem In the case of the chances for/against model, the obvious problem was the home plate area for defining chances.


Very interesting Topper, you obviously have a far higher-level understanding of stats than myself.

It seems to me in reading your post that your main problem with Charron's system is the first point - data verification. Without knowing that, you can't calculate the standard deviation, variability, etc. and so you have no way to judge how effective or reliable the numbers are. Is that correct? Same with not knowing the criteria for tracking - if all shots are treated equal, then the usefulness of the stat is diminished.

I have to admit I don't understand what a correlation coefficient is :oops:

But do you agree that these stats, even without the underlying raw data, are still more useful than "traditional" stats in assessing a player's value? And is there a publicly available "advanced stat" that meets the criteria for reliability?
ESQ
CC 1st Team All-Star
 
Posts: 704
Joined: Wed Jul 13, 2011 6:34 pm

Re: Advanced stats/ Moneypuck

Postby Topper » Fri Mar 23, 2012 4:04 pm

dbr, there are several data validation issues with Cam Charon's work.

Obviously his arbitrary home plate area causes it's own biases. A point shot is not a scoring chance, neither is a shot banked off a defender or goalie from behind the net.

If Raymond had bobbled Henrik's pass on his goal last night instead of getting a shot off and scoring, would that have been a scoring chance?

Next is the wild variation in the chances for and chances against on a game to game basis. I noted at the time they were originally posted that one game was 3 chances for for the Sedins, next game was 11 chances for. This is where a simple T-test is useful to look at the variability of the data.

It isn't much different than many of us arbitrarily waiting until the 20 game mark (roughly the 1/4 pole of the season) before posting assessments of player performances. You need a meaningful sample size to iron out some of the variability. Stats does offer up some simple ways to quantify that variability and tell us if we have a sufficient sample size to iron out that variability.

If the data is properly validated and any biases noted and results taken in context of those biases some of these methods can be very useful.
Over the Internet, you can pretend to be anyone or anything.

I'm amazed that so many people choose to be complete twats.
User avatar
Topper
CC Legend
 
Posts: 4987
Joined: Wed Sep 15, 2004 8:11 pm
Location: Earth, most days.

Re: Advanced stats/ Moneypuck

Postby RoyalDude » Fri Mar 23, 2012 7:01 pm

“A single death is a tragedy; a million deaths is a statistic.” - Stalin
"I just want to say one word to you. Just one word. Are you listening? - Plastics." - The Graduate
User avatar
RoyalDude
CC Legend
 
Posts: 4569
Joined: Mon Aug 09, 2004 6:36 pm
Location: Vancouver

Re: Advanced stats/ Moneypuck

Postby Larry Goodenough » Fri Mar 23, 2012 9:10 pm

Topper wrote:dbr, there are several data validation issues with Cam Charon's work.

Obviously his arbitrary home plate area causes it's own biases. A point shot is not a scoring chance, neither is a shot banked off a defender or goalie from behind the net.

If Raymond had bobbled Henrik's pass on his goal last night instead of getting a shot off and scoring, would that have been a scoring chance?

Next is the wild variation in the chances for and chances against on a game to game basis. I noted at the time they were originally posted that one game was 3 chances for for the Sedins, next game was 11 chances for. This is where a simple T-test is useful to look at the variability of the data.

It isn't much different than many of us arbitrarily waiting until the 20 game mark (roughly the 1/4 pole of the season) before posting assessments of player performances. You need a meaningful sample size to iron out some of the variability. Stats does offer up some simple ways to quantify that variability and tell us if we have a sufficient sample size to iron out that variability.

If the data is properly validated and any biases noted and results taken in context of those biases some of these methods can be very useful.


Well, I'm still trying to figure out alot of the issues you suggest are out there.

It's not just Drance/Charron, but a network of bloggers who form the "nation" network who use the same similar criteria to track scoring chances. I think it was mentioned upwards of 13 teams are being tracked using the same basic system (home plate). However, Drance/Charron state it's not just shots from within homeplate that count as chances, but they will count a screen shot from the point that has potential to go in or a shot outside of homeplate that involved significant movement of the puck before the shot. They're pretty open about the subjectivity of it. However, I did see one chance blog when they played the Jets. Charron and the Jets blogger both tracked chances and the difference in their counts was only 1 or 2.

As for the other data used, they take from behindthenet or arctichockey. I believe that info is taken from the NHL.com game tracking. The biases of home team scorers is often recognized, but is often thought to come out in the wash over a larger sample.
Larry Goodenough
CC 1st Team All-Star
 
Posts: 727
Joined: Wed Oct 19, 2005 10:43 am

Re: Advanced stats/ Moneypuck

Postby Topper » Sat Mar 24, 2012 6:31 am

Larry Goodenough wrote: They're pretty open about the subjectivity of it.

That subjectivity is a huge problem. There needs to be strict guidelines on data collection. There is already too much subjectivity in the league collected stats. Not too bad when it comes to shots, but when it comes to stats like hits, it is widely accepted that some rinks have homer advantages in their off ice officials.

Larry Goodenough wrote:As for the other data used, they take from behindthenet or arctichockey. I believe that info is taken from the NHL.com game tracking. The biases of home team scorers is often recognized, but is often thought to come out in the wash over a larger sample

There is the basis for much of the argument. Sample size. Back when this was first brought up. I did a quick and dirty, back of the envelope figure that it would take at least 2/3rds to 3/4s of a season before the outliers where massaged out.

That is why data verification, like a T-test, is important.
Over the Internet, you can pretend to be anyone or anything.

I'm amazed that so many people choose to be complete twats.
User avatar
Topper
CC Legend
 
Posts: 4987
Joined: Wed Sep 15, 2004 8:11 pm
Location: Earth, most days.

Re: Advanced stats/ Moneypuck

Postby wienerdog » Mon Mar 26, 2012 1:44 pm

Topper wrote:
Larry Goodenough wrote: They're pretty open about the subjectivity of it.

That subjectivity is a huge problem. There needs to be strict guidelines on data collection. There is already too much subjectivity in the league collected stats. Not too bad when it comes to shots, but when it comes to stats like hits, it is widely accepted that some rinks have homer advantages in their off ice officials.


Is that subjectivity that huge a problem though?

Truly objective data collection in more dynamic sports like hockey or soccer is known to be more difficult than in a sport like baseball, where the play has a more structured execution.

The pure objectivity of stats would be of paramount importance for sports geeks wanting to talk comparative analysis of teams across the NHL, but for a GM more interested in the the performances of his specific players, it's certainly less critical.

A wise GM would acknowledge the fact that a certain measure of subjectivity will influence the data set. Call it the "grain of salt" component to the numbers. As you already pointed out, tops, large sample sets are needed to reflect accurate trends regardless - the subjectivity of certain stat collectors would also tend to "wash out" over those bigger data pools.

A wiser GM with some serious resources would go farther than that.

More specifically, if I were Gillis, I wouldn't rely on NHL collected data anyway. As far as our own team is concerned, I'd only be interested in data sets collected by my own people, and I'd even want those sets customized into sub-sets of more specified pools of information.

For example, what is considered a trackable scoring chance by a Sedin banking a puck in off the back of a goaltender - a play that both brothers have executed before - might not register as a bona-fide "chance" when it's a fluky out-of-the-ordinary bounce committed by a fourth liner like Weise. That is to say, a stat system that could actually recognize that a Sedin is looking for that play while a Dale Weise isn't would be even more valuable.

Call it a filtered pre-interpretation of the numbers before they even get to Gillis' desk.

Hey, in an ideal world, you have a different stats geek tracking every single different player on the ice. Even better if each one was autistic and therefore was unlikely to form an emotional attachment to his data target.

Love to be a fly on the wall in that hiring process. :drink:
wienerdog
CC 2nd Team All-Star
 
Posts: 370
Joined: Fri Jul 15, 2011 4:47 pm

Re: Advanced stats/ Moneypuck

Postby Topper » Mon Mar 26, 2012 2:13 pm

Sample size will not necessarily "wash out" subjective data. To suggest it does also suggests that the subjective data is irrelevent and if that is the case, why bother with it.

As I have said, statistics has several methods for testing the relevance of collected data. These folks should use those methods.

They should also remember the principles of KISS and not overly complicate their stats. If you look at Bill James' work in baseball, he does not go out of his way to complicate his work and therein lies the elegance of his work.

With the GIS model I suggested, I would include a ranking system of scoring chances, but the ranking system would be guided by strict examples thus limiting the sampler bias when collecting data. It appears by LG comments, Cam Charron has a much more liberal practice. As LG noted, a difference of 1 or two chances in a game between samplers. That could easily be a difference of 10-20% or more.Hardly significant. EDIT Hardly insingnificant

Like I said, the GIS model I suggested could be set up for a few 10's of thousands of dollars and once done could be given to pro and amateur scouting staff. With a few hours training and some video work, consistency between samplers could be brought about. If a team was really keen, they could have a small handfull of data input monkeys watching video of each game and inputting data. Combine that with league TOI and game summary data and you are off to the races.

I am all for the use of advanced statistics, however unless the data is collected properly and validated, folks are just fooling themselves.
Last edited by Topper on Mon Mar 26, 2012 11:16 pm, edited 1 time in total.
Over the Internet, you can pretend to be anyone or anything.

I'm amazed that so many people choose to be complete twats.
User avatar
Topper
CC Legend
 
Posts: 4987
Joined: Wed Sep 15, 2004 8:11 pm
Location: Earth, most days.

Re: Advanced stats/ Moneypuck

Postby wienerdog » Mon Mar 26, 2012 3:00 pm

Topper wrote:Sample size will not necessarily "wash out" subjective data. To suggest it does also suggests that the subjective data is irrelevent and if that is the case, why bother with it.

As I have said, statistics has several methods for testing the relevance of collected data. These folks should use those methods.

They should also remember the principles of KISS and not overly complicate their stats. If you look at Bill James' work in baseball, he does not go out of his way to complicate his work and therein lies the elegance of his work.

With the GIS model I suggested, I would include a ranking system of scoring chances, but the ranking system would be guided by strict examples thus limiting the sampler bias when collecting data. It appears by LG comments, Cam Charron has a much more liberal practice. As LG noted, a difference of 1 or two chances in a game between samplers. That could easily be a difference of 10-20% or more. Hardly significant.

Like I said, the GIS model I suggested could be set up for a few 10's of thousands of dollars and once done could be given to pro and amateur scouting staff. With a few hours training and some video work, consistency between samplers could be brought about. If a team was really keen, they could have a small handfull of data input monkeys watching video of each game and inputting data. Combine that with league TOI and game summary data and you are off to the races.

I am all for the use of advanced statistics, however unless the data is collected properly and validated, folks are just fooling themselves.


Seems like we're pretty much on the same page.

I put "wash out" in quotes, because when one is using sets that are obviously so subjective (League stats), one has to engage in some kind of (further subjective) averaging mechanisms if you're going to rely on that type of data collection. I agreed that wasn't the way for franchises to go with advanced stats - that's what the Charron's of the world are using because that is the resource that is available to them.

Instead, you are suggesting what I would also see as the best solution - employ one's own "data divsion", literally a few guys that are charged with tracking the relevant data of every single game. You're limiting any subjectivity to only that which might be intrinsic to the data stream that your own guys are collecting. The only weaknesses I see to this system are

1) cost (not an issue for Aquaman - well, pre-divorce at least...), and
2) security (if you want to leave the Organization, data monkey, it's the Long Dirt Nap for you). :mex:

As for Bill James vs hockey stats: I'm all for simplicity and elegance, and the sooner any pool of information can cut to the bone the more effective it will prove to be. All I have to offer up is what I said before: to me, baseball and hockey are wildly different animals on a visceral level, let alone a statistical one. It would seem pure common sense to me that while one statistic genius could engineer an elegant and model of data collection and analysis for baseball, that same genius could tie himself into knots trying to apply that same simple elegance to a beast like ice hockey. For hockey, I have a hard time picturing a data set that is relevant yet not complex or complicated - it seems to be inherent to the nature of the game. The interpretation of that that set is a different matter entirely.

Having said all of this, I'm not even anything close to approaching a "stats guy". I have an intense love of the strategy used to build a winning team (maybe even more so than the tactics on the ice), and thus these discussions are like candy to me. I love this shit even if I don't understand all of the nuances in it.
wienerdog
CC 2nd Team All-Star
 
Posts: 370
Joined: Fri Jul 15, 2011 4:47 pm

Re: Advanced stats/ Moneypuck

Postby Topper » Mon Mar 26, 2012 11:22 pm

Maybe I should add for completeness. I don't see much of a problem dealing with league collected stats. Yes they will vary by homerism in some rinks, but that homerism should be easily spotted in the data and the bias should be able to be accounted for. A process of data leveling shouldn't be too difficult to engineer.

Also, when I talk of the elegance of Bill James' work, I have to implore that he did not over work the data. He did not get into complex calculations comparing variables to come up with a new measure. that is what I mean by Keep it simple, stupid.

Cost, as I pointed out is not an issue. A few tens of thousands of dollars in a scouting program making desicions worth several millions of dollars is not an exorbitant investment.

Security is moot, as other teams will quickly adopt similar systems given the relative low cost of entry.
Over the Internet, you can pretend to be anyone or anything.

I'm amazed that so many people choose to be complete twats.
User avatar
Topper
CC Legend
 
Posts: 4987
Joined: Wed Sep 15, 2004 8:11 pm
Location: Earth, most days.

Re: Advanced stats/ Moneypuck

Postby Vpete » Tue Mar 27, 2012 7:57 am

Topper you have hit the nail so hard it's gone through the board.

While Charon, Drance, Desjardins and Johnson, among many others, use stats and techniques to quantify what they believe there is a whole other alchemy involved in Advanced Stats.

First off some of them are incredibly arrogant and use their work to support outcomes they predicted long ago- I have yet to see one say they were off. I guess that's a privilege of being one of the first 'out' on this front. They are also very possessive and argumentative in regards to challenges of their work, Tyler Dellow and Desjardins being the leaders by far.

So what makes it so special? Well start with Corsi- it's a way of looking at the game and drawing some simple conclusions about what a player does on the ice but does it tell a more complete story than observation or does the information support the observations? Almost a chicken and the egg scenario.

Look at Desjardins' site and look at Quality of Competition- what determines it?

There are numerous ways we could go about this. We could average out the points-per-game of opposing players(as Jonathan Willis has suggested, and which works reasonably well, particularly when you have no ice time information), but I think the best place to start is with what I called "Relative +/-" or "Rating" in part 1. Relative +/- adjusts a player's on-ice +/- relative to his team's +/- while he was off the ice. In general, it corrects for the boost players get from playing on a good offensive team and vice-versa.

If we average that rating across all of a player's opponents, weighting for how much time they played against one another, then we have an estimate of how good a player's opponents were relative to their teams. In a general sense, first line players have the best ratings, so players who play against the first line should see the highest opponent rating. That average opponent rating is the "Quality of Competition" faced by a given player.


Ask them what they think of using plus/minus as a stat and then see that it's completely valid for them in this scenario. Of the top 30 scorers in the NHL 9 are in the minus, almost 30% So how does Desjardins 'adjust' the +/- as he says above?

We can make a small improvement on +/- by subtracting the +/- when a player is off the ice from it. That is, if a player was +1 goal per 60 minutes on the ice and his team was even when he was off, he ends up appearing the same as a guy who was even on the ice while his teammates were -1 per 60 minutes. It's not perfect, but it does make an adjustment for how good a player's teammates were. This statistic has several names - relative +/-, On-Ice/Off-Ice +/-, or simply "Rating", as I've called it on the stats page.


Here's the list: http://www.behindthenet.ca/2008/new_5_on_5.php?sort=7&section=goals&mingp=40&mintoi=10&team=&pos=

Where I have a problem with all this is hockey is a game with unique variables unlike other sports. If Datsyuk is a +/- there is no way to determine how he got there by Desjardins' stats or anyone else's. In his list it shows Rene Bourque as 3rd for 'rating'. How many turnovers does Datsyuk create? How many points from turnovers does he earn? How often did his team mate convert on his work? How many scoring chances does he create? What makes Datsyuk one of the best players in the game is how he handles the variables within the game and what he does and none of those stats tell you how- they only support that he does but also Rene Bourque does too.

Look at stats like shots attempted. Wouldn't it be nice to know how often a team recovers missed shots in their defensive zone and in the offensive zone? If you are a puck possession team like the Canucks it's pretty important especially if you adhere to strict zone start regimens.

I think the advanced stats are incredibly interesting and tell a good and relevant story for teams about players and past performance but I believe there are betters ways to do it. The Canucks have their own team who do advanced stats and it sure as hell would be interesting to know what they track as I bet it's far different than what we see.
Brick Top: Do you know what "nemesis" means? A righteous infliction of retribution manifested by an appropriate agent. Personified in this case by an 'orrible cunt... me.
Vpete
CC 2nd Team All-Star
 
Posts: 320
Joined: Sun Jul 10, 2011 3:01 pm

Re: Advanced stats/ Moneypuck

Postby Topper » Tue Mar 27, 2012 9:19 am

Vpete wrote:First off some of them are incredibly arrogant and use their work to support outcomes they predicted....

....... Almost a chicken and the egg scenario.

Pretty much what drew me to posting this.

Rather than combing the stats for patterns and letting the numbers tell a story and best of all using them to project performance, some of these folks come up with a story of a players performance and then concoct a stat calculation to support their story.

That is where "Lies, Damn Lies and Statistics" comes from.

Using some calculated value to justify why a team or player is where he is, is silly grandstanding.

The value is in combing the data, spotting trends and using those trends to project a players performance.
Over the Internet, you can pretend to be anyone or anything.

I'm amazed that so many people choose to be complete twats.
User avatar
Topper
CC Legend
 
Posts: 4987
Joined: Wed Sep 15, 2004 8:11 pm
Location: Earth, most days.

Re: Advanced stats/ Moneypuck

Postby ESQ » Tue Mar 27, 2012 10:29 am

Vpete wrote:First off some of them are incredibly arrogant and use their work to support outcomes they predicted....

....... Almost a chicken and the egg scenario.

Topper wrote:The value is in combing the data, spotting trends and using those trends to project a players performance.

That is something that seems to be missing. Every guy posts an article ex post facto legitimizing their predictions, but I haven't seen them publish their predictions. Like the guy who said the Leafs fall proves his shooting % analysis - their shooting % was way too high when they were winning, it was unsustainable, now that their shooting % isn't as high they're not winning. That's all well and good, but did you publish that prediction when they were winning?

If anybody finds a good author who is making reasonably accurate predictions, I'd love to see that.
ESQ
CC 1st Team All-Star
 
Posts: 704
Joined: Wed Jul 13, 2011 6:34 pm

Re: Advanced stats/ Moneypuck

Postby Rayxor » Tue Mar 27, 2012 11:03 am

ESQ wrote:If anybody finds a good author who is making reasonably accurate predictions, I'd love to see that.

He would be the guy that wins his hockey pool every year. If anyone does have some analysis that is an accurate predictor, I'm sure he is keeping quiet about it and earning a living in sports betting.
User avatar
Rayxor
CC 2nd Team All-Star
 
Posts: 216
Joined: Mon Jul 11, 2011 8:05 am

Re: Advanced stats/ Moneypuck

Postby dbr » Tue Mar 27, 2012 1:16 pm

I think a lot of people predicted that the Minnesota Wild were due for a serious correction back when they were 1st overall in December (heck I left a comment on ProHockeyTalk that was.. not received well), although maybe not the kind of spectacular fall we've seen since.

Same with the Edmonton Oilers when they were getting .950 goaltending in the first couple weeks of the season.

But both of those predictions are based a bit more on shot attempts, shooting and save percentages rather than scoring chance data. I don't know who out there is using that to predict the final standings before the season starts, although I guess it is one way to try to work out which players and teams are 'better' than the results they got previously. Trouble is, lots of players and teams remain 'better' than their results for a long, long time.
dbr
CC Hall of Fan Member
 
Posts: 2554
Joined: Sat Jul 09, 2011 5:37 pm

Re: Advanced stats/ Moneypuck

Postby Topper » Mon Jan 28, 2013 8:06 am

I recently read a series of articles/primers by Blake Murphy at nucks miconduct where he looks at performances after 4 games using advanced statstics.

He does so in very coached terms and echos many of the issues I noted last year.

1) advanced stats are a supplement to watching games, not a replacement and this is especially true with a small sample size. This isn't a pitcher-batter situation, hockey is far more dynamic.

2) in a 48 game season all sample sizes will likely be too small. I noted last year that a quick and dirty calculation suggested 2/3rds to 3/4ers of a season may reduce the influence of outliers.

3) when looking at results from a small sample size it is not possible to compare players but is best too look at trends of performance.

Maybe someone was listening last year. I'll keep reading Blake's work and see where he goes with it.
Over the Internet, you can pretend to be anyone or anything.

I'm amazed that so many people choose to be complete twats.
User avatar
Topper
CC Legend
 
Posts: 4987
Joined: Wed Sep 15, 2004 8:11 pm
Location: Earth, most days.

PreviousNext

Return to Canucks Corner Chat

Who is online

Users browsing this forum: Bing [Bot], Google [Bot], Lancer, RoyalDude, TDA Rum, Yahoo [Bot] and 2 guests