The Numbers Game
Has the data analytics revolution made football better in the last decade? Your answer may depend on whether your own favoured team has become better – or worse.
Take the new orthodoxy of “playing out from the back”: almost every big English Premier League side now dogmatically moves the ball out of defence with an intricate sequence of short passes, often executed in tight spaces. Each touch in the sequence is often attended by a swarm of opposition forwards, slavering for a possible interception.
For Arsenal, the disaster-addicted team I have been enslaved to since the beautiful autumn of 1990, this doesn’t feel like a good tactic. Of late, Arsenal has evolved the defensive blunder into a sort of tragicomic performance art. The team’s back four are continually finding new and amazing ways to create a goal for the opposition. In light of this, I want Arsenal defenders to move their fascinating blunders far away from our penalty box whenever possible: by playing long, crudely hopeful passes to our wide forwards at the first opportunity.
The data guys disagree with me, of course. These whip-smart miners of football chaos, tapping away on their laptops in the shadows of every big club, have told the head coaches that the old, safety-first habit of playing lofted passes out of defence is more likely to gift possession to opponents than a short-passing transition. Too often, the long ball comes straight back, they say. Of course, the very same data guys have also urged a swarming “gegenpress” on opposition defenders to disrupt their version of “playing out”; so at least they are stressing out other clubs’ fans too. They are both the poison and the cure.
Mikel Arteta, the Arsenal head coach, listens to his data guys, and not at all to me, which is fair enough. They’ve crunched and analysed every outcome of every Arsenal pass over the last five seasons; I haven’t. So they are technically correct, whereas I feel I am morally right in my advice to the team: Get rid of the ball. Hoof it into Row Z if you must. Just give us a sense of hope and salvation, even if it’s false.
But football luddites like me are on the wrong side of history. Because the burgeoning science of data analytics is radically improving the quality of football decision-making. It equips head coaches and club owners to buy better players for better prices, to sniff out secret opposition weaknesses, to discover and fix their own players’ secret flaws, to maximise their secret strengths.
It’s widely accepted that this revolution began across the pond, in the weird parallel world of baseball: when the Oakland A’s general manager Billy Beane won the World Series in 2002 with a cheap team assembled using a then-radical approach to data. Baseball is unusually rich in granular stats, and Beane and his analysts mined those more systematically than any other manager had previously done. The numbers sniffed out unproven or neglected players, undervalued by the vagaries and prejudices of the transfer market. The plan worked, and it wasn’t luck. Beane’s model came to be known as sabermetrics, and a couple of years later, the Red Sox (owned by Liverpool owner John W Henry) borrowed Beane’s recruitment strategy to break a 100-year-old World Series trophy drought – the infamous curse of the Bambino.
Several leading football coaches, not least Arsenal’s visionary Arsene Wenger, had grasped the promise of data analysis as long ago as the late nineties. But back then, the available data was too crude and too limited to really change the game. Because football is so fluid and unstructured compared to basketball or baseball, its variables are much harder to untangle. For example, any given completed 10-metre pass has a different statistical value to every other 10-metre pass, even to other passes played between precisely the same two points on the field, if the positions of teammates and opponents are factored in. But the data guys have attacked this maze of permutations with a vengeance – using rapidly advancing AI technology and lavish staffing budgets. They measure and collate an ocean of data points in each game.
Arguably their most transformative concept is xG – the numerical value describing the quality of any given goalscoring chance. By measuring xG, you can accurately measure an attacking player’s tendency to overperform xG and the average player. You can also analyse the yield in good chances of a given tactical shift. The project is to filter out the confounding factors of luck, tradition, or the feeling that feels like intuition but may just be a wild guess.
One of the trends driven by analytics is that players are shooting less often from long range in response to the data wonks’ advice that long shots are not as statistically effective as shots from closer range. So the attacking moves tend to be more patient, seeking to carve out a route into the box whenever possible.
Wide players are also crossing much less frequently since deep crosses have been shown to be much less goal-productive than their traditional frequency in England suggested. Nowadays, the typical wide attack involves infield runs by the inverted wingers, either to exchange shorter ground passes with central forwards or to shoot with their stronger foot from the corner of the box – an area from which curling shots toward the far post are reaping a rich yield of goals.
But could this shift lead players to forget the noble art of the deep, curling cross, as perfected by David Beckham and latterly by Trent Alexander-Arnold? If so, a positive feedback effect is on the cards: the instructions of the analysts could filter into the players’ skill sets, for better or for worse. A damaging version of this effect happened long ago when the post-war English data pioneer Charles Reep theorised that because the vast majority of passing moves lasted no longer than three passes, the vast majority of goals come from short passing moves, i.e. from long balls to the forwards. Therefore teams shouldn’t even bother to try to keep the ball, Reep argued.
But as Jonathan Wilson showed in his tactical history tome “Inverting the Pyramid”, Reep saw a correlation -- and not causation as he assumed. His long-ball philosophy was taken to heart by several smaller English clubs in the 1980s, seriously damaging the skill sets of a generation of English players. The Norwegian national side under Egil Olsen got some great results using Reep’s tactics, but they were the exception that proved the rule. Olsen was making the most of limited resources: a mediocre midfield plus some useful big centre-forwards.
In today’s game, some coaches have become notorious for sacrificing artistry in the name of data-driven efficacy. The Danish club Midtjylland have won two league titles recently under coach Glen Riddersholm, who has used big data to develop an almost maniacal focus on set pieces as a treasury of goals. Midtjylland played with a metronomically disciplined system. Shooting accuracy was prized above power or improvisation to maximise the average yield of any given chance.
It’s no surprise that Midtjylland are owned by the English tycoon Matthew Benham, an Oxford physics graduate and former professional gambler who also owns the newly promoted Premier League side Brentford FC. The Bees are an attractive enough side, less functional in their tactics than Midtjylland, but their transfer strategy has been mercilessly constrained by data. Every player bought must be unheralded but statistically promising, and the result has been a steady flow of hefty resale profits. So clinical is Benham’s philosophy that he closed the club’s academy on the grounds that the return on youth investment is too unreliable to justify the cost. But if everybody shared that bean-counting wisdom, then there would be no good players to buy.
Of course, data-driven tactics need not be so rigid. The glory years of both Liverpool and Manchester City during the last five years represent a much more nuanced marriage between inspiration and information, between numbers and ideas. Both those clubs’ coaches – Jurgen Klopp and Pep Guardiola – are potently original thinkers about the game, but they are also both avid consumers and respecters of data. They tailor their tactical visions to fit all the available facts, whereas many great coaches of the past would choose which facts they wanted to think about and ignore the rest.
The other good news is that sport’s mysterious psychological dimension is ultimately impervious to the manipulations of data geeks. Manchester City will always have a healthy chance of losing to an objectively weaker side in a huge game, as they did in the last Champions League final. And I fear that Arsenal will never abandon their passion for defensive disaster. The wonks can do their worst and do their best, but footballers will stay defiantly human.