Someone on another forum I frequent works as a statistician/analyst/whatever you'd call them for an MLS team posted a few interesting things about what they do, and I figured I'd just post it here since it could help people understand a bit more about "soccernomics" or whatever you want to call it:
"No, % of passes completed is a trash stat. Almost everything available in the public domain is a trash stat."
"Every single event that happens on the field has a specific, unknown (right now) value. We're trying to figure out those values. Our goal is to eventually include every single field even as inputs, whereas right now, it's just goals and cards, essentially. From this, we can have some sort of objective, if imperfect, evaluation of a player."
"I guess I should explain why passing % if a bad stat. It's essentially equivalent to batting average in baseball. You know at what frequency a successful event happens, but you don't know what value that successful event has, or if it should be considered 'successful' at all. It's the equivalent of having 4 quarters and 3 dimes and telling people you have 7 coins, rather than saying that you have $1.30"
Pretty interesting IMO, and goes to show we really have no idea what Comolli and the rest of our scouts look at statistically.
I'm not sure that does justice to how difficult a statistical problem football is compared to baseball. As the individual above notes, the goal of the whole approach is to relate specific events to the probability of the team winning.
In baseball:
1) it's easy to divide the game up into events because there are actually discrete "plays" - i.e. each pitch is a discrete event
2) further, it's easy to make a list of reasonably "equivalent" events - e.g. all at-bats with the same number of outs and the same number of men on base - all of these events start in the same place and then the actions of the player determine the state of the game at the end of the event
3) it's possible to code the action of the player independent of what the actual result was in terms of hits/outs (i.e. you can code an at-bat as "line drive to left/center/right; ground ball to left/center/right; fly ball to left/center/right) which is useful because you'd prefer a player who hits a lot of balls hard but has gotten unlucky in terms of hitting them right at people than a player who has gotten lucky to have a lot of ground balls go through the infield for hits
4) a batter's success is *somewhat* independent of the actions of his teammates - not 100% true because runners on base can alter an at-bat quite a bit, but you can control for those situations without much trouble (there are a maximum of three runners on base and they can only do a very limited number of things themselves to affect play)
5) there are a relatively small number of teams and tons of games (162 a season) - players all play against one another, making the statistics roughly equivalent
6) and I could go on for sometime
Compared to football:
1) there are no discrete "plays" - everything is a continuous flow - meaning that to do any statistics, you have to figure out how to divide the game up into discrete events all of which are going to be somewhat poor (each pass? - what if a player dribbles all the way down the pitch - that's a lot of game time that goes unaccounted for by an 'each pass' scheme; each dribble? well, that divides the game up into so many chunks it's impossible to code; etc )
2) it's hard to create a class of equivalent events. For instance, events don't ever start in the same place. Every pass isn't made from the same location with teammates and opposing players in more or less the same place, etc.
3) the space of things that players can do is much larger - particularly when you consider things players can do off the ball.
4) the success of a player does depend highly on his teammates. how many times have we seen really great passes...if only a teammate had made the proper run. Does the player who made the pass get positive credit for that pass despite the fact it ended up as a loss of possession? How do you code what the player who should have made the run did?
5) there are a somewhat small number of games and many many leagues and leagues - so in many cases you're comparing statistics for two players that were computed against totally different opponents with no overlap whatsoever - i.e. Ligue 1 versus Eredivisie
6) etc
So my take is that although it'd be dumb to totally ignore what statistical approaches can tell us about players, the way football works makes it a totally different beast than baseball, so it would also be dumb to think that because it's a "statistical approach" it's inherently better than trusting the instincts of people who have been around the game...