(Research for this blog provided by Thilo Klein and Koffi Zougbede, PARIS21)
Some three weeks ago or so I saw an odd email arrive in my inbox. The subject line read: "[Euro 2016] Sweepstakes!" Yes, the month-long European Football Championship tournament has arrived and is now in full swing. This championship, held every four years, is the second biggest football event behind the World Cup. As an avid sports fan I was pleased to see the integration of a continent-wide obsession into casual office talk. Unlike our North American counterparts who discuss their NCAA basketball bracket around the water cooler, offices all across Europe are choosing their favorite team for Euro 2016 over cups of espresso.
Football statistics are clearly "en vogue" these days -- the latest I heard was from researchers who analyzed the performance of right foot versus left food scoring penalties (left foot are better but only until the knock-out phase); but how can an everyday citizen use this raw data to their advantage in selecting a winning team? And most importantly, who will be crowned the European football champion on July 10th at the Stade de France in Paris?
Apart from simply selecting your home country or following the advice of bookmakers, there seems to be two primary methods to predicting the winning team. On one hand, the use of "animal oracles" like Paul the octopus has become an increasingly popular way of selecting match winners. On the other hand, there are the more traditional algorithm-fed models which crunch a myriad of data from the 24 competing teams, including the market value of the players, previous performances, etc.
So, let's have a closer look at both types of predictions -- the "animal oracles" and the "algorithm approach."
Since the success of Paul the octopus, who successfully predicted eight games (including the final) during the 2010 World Cup, there has been a surge in animal oracles around the world. Dogs, penguins, elephants, koalas, kangaroos and sea turtles have all been used to predict matches. Their prediction power though has been quite mixed. According to one source, out of 10 animal oracles, on average, only five got it right. And so far, none have been as successful as Paul.
For a more scientific approach, you can choose your winners from a plenitude of statistical models based on diverse data sources ranging from long time series of historical team performances to individual players' current market valuations. One of the most discussed models is produced by Goldman Sachs. But what is the science behind such a model? On their blog, the investment bank allows a look into their recipe for success. Their five ingredients are more basic than most would assume.
Take a team's performance in the international Elo ratings, add the number of goals scored and conceded in the latest competitive matches and top it up with dummy variables for whether the match was played at home or at a European Championship. Then stir these ingredients together in a Poisson model to explain the number of goals scored by each team in each compulsory match since 1958. This model is then used to produce subsequent predictions for the complete tournament schedule of EURO 2016. Eliminating the losing team in each step -- down to the final -- then leaves us with the predicted champion.
From a scientific point of view, it seems we have a clear winner here. So why do animals still feature so prominently in the news?
For one thing, despite all the science applied to choosing a winner, football remains a highly unpredictable game and statistical models are still somewhat a "black box" to most of us. Goldman Sachs' model, for instance, failed miserably in the 2014 World Cup when it did not correctly predict that heavyweights Spain and Italy would be eliminated in the group stage; in the Euro 2016 hardly anybody predicated Iceland to pass through to the round of 16 or that Spain would have been ranked third out of four teams during the 2014 World Cup.
Regarding the use of animals for predictions -- which for many is more fun to watch then the spitting out of numbers from fancy models -- we might fail to recognize basic statistical biases including the ridiculously low sample size of Paul's predictions before his sudden death following the 2010 World Cup or the survivorship and selection bias from "picking the winners" from the many animals that have since been consulted as oracles (turtles, parrots, dolphins, ...).
Why does this matter?
For a start and as an advocate for the increased use of data and statistics in public life, the surge in interest from a broader part of the population is great for private consumption and generally quite fun. If people start using data and analysis for trivial things like sports matches, perhaps they will also become increasingly interested in data on the quality of public services such as sewage, health, education or safety. From there, it is only one additional step to not only consume but also produce data -- citizens generated crowd-sourced data has become increasingly popular. This could then turn into an interest in the broader issue of becoming more data literate and having an informed opinion on how to manage the trade-offs between a cost-effective policy intervention and the protection of privacy and confidentiality. It is a bit of a long shot but perhaps the EURO and other big sporting events should be used more to advocate for broader questions around who is winning a specific event -- as important as that is.
So, how should you pick the winner? If you are the risk-averse type, it is probably safest to follow the wisdom of the crowd and go with the bookmakers -- although even this is not as sure as it seemed as we just saw with them saying 3:1 for the UK to stay in Europe. If, on the other hand, you like to gamble and beat the market you can try a statistical model, keeping in mind that the predictions from most models are actually not too far off from the bookmakers' odds. Finally, for all the pet-lovers, you just might be better off choosing the prediction of your favourite animal.