A couple weeks ago, Bruce Arena caused quite the brouhaha in the world of soccer analytics when, in a press conference following the Galaxy's road win in Portland, he declared among other things, "analytics in soccer, if no one here has figured it out, [don't] mean a whole lot," and "analytics are used... for those who don't know how to analyze the game."
You can watch his statements in full, here, and I suggest you do because I don't want to be accused of taking anything out of context as I will be dealing with aspects of his claims throughout this article.
Those familiar with my work, both here and at American Soccer Analysis where I have published studies on things like finishing in MLS and correlations with shots against, will know that analytics is something which is near and dear to my heart and something I have long argued the Galaxy should take more seriously.
The level of disdain from Bruce Arena, however, is far from encouraging for people like me who wish to see analytics take hold in MLS. After all, this man happens to be the head coach and general manager of one of the richest and most successful teams in the league, not to mention the most decorated coach the United States has ever produced. When Bruce Arena speaks, the world of US soccer listens, which is why it is so unfortunate that these comments were made, because, in short, they demonstrate a fundamental lack of understanding of the field he is criticizing.
We won the game. That's what you do in soccer games. We were on the road in a venue where, I think, this team does pretty well at home. What are we complaining about? Then some moron will write that they had more shots than us thinking that's important.
I'd like to break down the above portion of Bruce Arena's stament to demonstrate this. In it, Bruce Arena uses the narrow frame of shot ratio to go after the entire field of analytics as a whole. In doing so, he is not only unfairly grouping all aspects of analytics with a single aspect of it (shot ratios), but he demonstrates a lack of familiarity with relevant work pertaining to it.
Claiming that analytics fails because the Galaxy won despite getting outshot is, not only a straw man, but a lazy one at that.
For one, analytics can not be tested in a single game as the sample size in question is far too small. More importantly, a lot of the analytics done on shot ratio actually supports his argument that losing the shot battle in this game is irrelevant. But first, a brief introduction to the concept.
In analytics, there is a metric known as Total Shot Ratio (TSR= shots for/(shots for + shots against) which has shown predictive value greater than traditional prediction methods such as points per game and goals. TSR, however, has been largely abandoned for newer metrics such as expected goals which have shown even greater predictive value, but more on that later. For now, let's stick to the concept of shot numbers as it is the realm which Bruce Arena has framed this argument.
The biggest issue with Bruce's hot take in regard to the value of shot analysis is that he miscategorizes what such analysis actually predicts. For instance, according to 11tegen11, "teams that go a goal up create over 10% less chances, and allow over 10% more chances at the same time. The shift in TSR is over 25% in favor of the team trailing the goal." On the flip side, the team trailing a goal tend to see conversion rates on shots go down as the team up a goal puts more men behind the ball, and the team up a goal see conversion rates go up as space opens up as more attackers are sent forward for the equalizer. When looking at TSR, the nuance of game state can be crucial.
Considering the Galaxy were the team up a goal and scored on limited chances, if anything, a lot of work done on TSR supports the claim that the shot totals in this game might not tell the whole story. In other words, if anything, analytics is being vindicated, not proven wrong, if one could safely draw a conclusion from a single game, which, as I stated earlier, you quite obviously cannot.
Now I would like to examine the claim which Bruce Arena makes in the end.
I'll be very honest with you. This isn't baseball or football or basketball. We have a very important analytic—it's the score, and that distorts all the other statistics.
Here, Bruce is fully dismissive of any metrics other than the score line. Of course, this is rather disingenuous coming from a manager who after the Galaxy's 4-2 defeat of the New England Revolution this year said things like "we can play better" and "we need a lot more work together, a lot more coordination." I say this because, in these quotes, Bruce Arena clearly shows that there is importance beyond a score line.
If Bruce Arena sees his team lagging in certain areas in a game they win, he will look to address it as he knows that this will help improve the team's chances of winning future games. And, ultimately, this is what a great deal of the field of soccer analytics is about —finding things outside the scoreline which show predictive value for future outcomes. It is odd, then, that he would consider those who use statistically important predictors to be morons when he himself employs a similar strategy with his eyes.
When you break it down, the problem with results based analysis is that there is simply too much luck involved in soccer to use score lines, especially in small sample sizes, as indicators of who played the best soccer. Play on the field is ultimately your best predictor of future play on the field, and, thus, over many games, future results.
In the following video lecture made by my good friend Matthias Kullowatz for a statistics course he was teaching, he demonstrates that, while a correlation between points earned in the first half of the MLS season and points earned in the second half of the MLS season exists, it is a particularly weak one.
There are a number of articles out there, such as this one from 11tegen11 and this one by Michael Caley, which demonstrate that goal scoring, which Bruce Arena highlights as the most important metric, has less predictive value than TSR (which he mocks) and expected goals.
In the 11tegen11 article, TSR showed predictive value much earlier than points per game over a season. As the article states "it is now proved possible to identify the strength of teams as early as after seven or eight match rounds, with an accuracy comparable to what (goal ratio and points per game) could only achieve at their height in mid-season."
One drawback, however, was it tended to pick up a lot of noise as the season went along and was no better by season's end.
Expected goals, however, a metric which is newer and more advanced than TSR, ended up proving to be the best metric.
"It picks up information much like the raw shot metrics do in the very early stages, then predicts future performance significantly better at early to mid-season, and also holds predictive capacities for longer. It makes sense to use Expected Goals Ratio from as early as four matches played. Even that early, it is as good a predictor of future performance as Points per Game and Goals Ratio will ever be," the article concludes.
So, while scorelines are not necessarily bad predictors of future goals, they are far from the best predictor available. Bruce's sentiments are demonstrably wrong on this front. Furthermore, his conclusion that people who think shot values are important are "morons," is also proven to be off base as TSR is demonstrably better than his own cited most important metric.
I'd like to conclude by commenting on the overall notion that "analytics in soccer ... [don't] mean a whole lot," and the dismissive claim that soccer is not like baseball. This is a criticism I hear all the time and can generally be broken down to the argument that, because soccer is a fluid sport, it cannot be measured.
This particular argument is one that really grinds my gears, and here's why. Obviously, we can't measure everything in soccer. We can, however, use statistics to see how much we CAN measure, and, with some expected goal models being able to explain as much as 79% of the variance in goal differential in a season, it's pretty safe to say that something of profound fundamental value to goal scoring is being measured.
Will the models ever be able to show a perfect 1 to 1 correlation? Of course not, however, the correlations are absolutely high enough to have meaning and value which is precisely why the field is becoming increasingly important to clubs across the globe to the point that Arsene Wenger is referencing expected goals in interviews.
Bruce Arena's comments seem hasty, rash, and mean-spirited, and seemingly demonstrate the manager hasn't done a lot to educate himself on the field. Just consider the statements. Bruce comes out swinging against TSR, something the field has mostly advanced past, and bases his conclusions on a limited sample size, and, judging by how he characterized what a TSR analysis of this game would be, a limited understanding of work on the subject. He then redirects his poor conclusions about TSR as evidence of the failings of the entire field of analytics as a whole.
Worse yet, he made the entire thing into a personal attack on those who use analytics, claiming that people who think shot numbers are important are "moron[s]" (remember, they are demonstrably better at prediction than his cited most important metric: scorelines) and asserts people who use analytics "don't know how to analyze the game."
And, I should say that the use of analytics as a whole is not unassailable, however, if the argument you are making against it shows a lack of research and quickly devolves into anti-intellectual sentiment and personal attacks, you have no credibility on the subject. Even if you're Bruce Arena.