We all love a good underdog win. Be it the Orioles beating the Yankees*, the Lions beating whoever they’re playing that week, or Croatia beating Brazil, we love it when the team nobody thinks will win pulls it off.
*Or anybody beating the Yankees, let’s be honest.
But how often does this happen? We’d need to find not only the outcome of the game itself, but also find a way to measure just how unlikely the win was in the first place. Thankfully, 538 makes their soccer* match-level predictions available on GitHub (details about the data here), so we can see what odds their model gave each team before the match and use that as a proxy for how much we’d expect a win. Let’s look at an example game.
*I was going to do this with baseball, and was thinking through how to go about calcing who the favorite was when I stumbled on this. Sometimes you just go with what’s there.
I picked the game with the highest win probability in the data (which only goes back to 2016/17, but that’s enough for our purposes), which was Man City vs. Cardiff City back in 2019. The 583 model gave The Citizens a 93.89% chance of winning the game, which they did going 2-0. But c’mon, that’s the big boys winning - let’s find the biggest upset instead!
Show the code
soccer_tall %>%filter(game_id =='2021-04-03 Chelsea vs West Bromwich Albion') %>%select(date, team, location, win_prob, lose_prob, tie_prob, goals_for, goals_against, W) %>%gt() %>%tab_style(locations =cells_column_labels(columns =everything()),style =list(cell_text(weight ="bold") ) )
date
team
location
win_prob
lose_prob
tie_prob
goals_for
goals_against
W
2021-04-03
Chelsea
home
0.8659
0.0198
0.1143
2
5
0
2021-04-03
West Bromwich Albion
away
0.0198
0.8659
0.1143
5
2
1
Oh, Chelsea. You had an 87% chance of winning and 11% of tying, whereas lonely West Bromwhich Albion - who would go on to be relegated that season - was only give a 2% chance of winning*. Yet they beat mighty Chelsea 5-2 in the textbook definition of an upset.
*soyouresayingtheresachance.gif
OK, so we’ve got our bearings, and have a pretty good idea how this works. So let’s take this to the next step. What percent of games do teams with a 5% chance of winning actually win? Should we make like Han Solo and insist we’re never told the odds, or is there something there?
Well that’s good news for the 538 model - if a team has a win probability between 0% and 5%, they win about 3% of their games, which seems reasonable. There are some odd bits in the data (teams between 70% and 75% odds only win 65% of their matches), but those are likely just random noise in the data.
The overall story is that if a team is a favorite to win, they probably will. And that’s a good thing - the rarity of upsets makes them unique and memorable. As West Brom reminded us, even a mere 2% chance is still a chance.