Man City beat Nottingham Forest 2-0 on Saturday to go up to 6 wins in 6 games this season. The most a Premier League team’s ever started with is 9, so we already have a sense that winning every game this season is a stretch.
To measure this, we’ll use the binomial distribution. This gives us the chance of of x events happening in n trials given the odds of success. For instance, if we wanted to know the odds of getting one head when we flip a coin, we’d get 50%. It we wanted to know the odds of getting two heads, we’d get 25% - the extra flip doubles the potential outcomes from heads/tails to heads/heads, heads/tails, tails/heads, tails/tails.
We can use the dbinom formula in R to calc the odds of each PL team of having 38 wins prior to the season starting. We’ll use their winning percentage from last season - there are better ways to estimate talent (projection, Bayesian shrinkage, etc.), but we’ll try the quick version first and make things more complex if warranted.
Show the code
library(formattable)library(hrbrthemes)library(tidyverse)library(worldfootballR)set.seed(2023)pl_23 <-fb_season_team_stats(country ="ENG",gender ="M", season_end_year =2023, tier ="1st", stat_type ="league_table")pl_23_perfect <- pl_23 %>%mutate(win_perc = W / MP) %>%mutate(perfect_win_odds =dbinom(38, 38, win_perc)) %>%filter(Rk <=17) %>%arrange(desc(win_perc)) %>%mutate(Squad =fct_reorder(Squad, win_perc))pl_23_perfect %>%ggplot(aes(x = perfect_win_odds, y = Squad))+geom_bar(stat ='identity', fill ='#38003c')+theme_ipsum()+scale_x_percent(accuracy =2)+ylab('')+xlab('Odds of Perfect Season, Start of 2023/24')
Well we can just stop right there. The scale doesn’t even one percent. City’s at 0.0009, Arsenal at 0.0005, and nobody else is even worth reporting on. City had a .737 win percentage last year and Arsenal .684, and even at this level they face basically impossible odds.*
*Obligatory “never tell me the odds” joke.
But this was at the start of the season. Where would we rate City now, with 32 games left?
Well, that’s better relatively, but you still have to squint to see the number. When do we think it’ll be worth looking at? Let’s look at the whole season. We’re calcing the odds before each game - so game 1 will be our pre-season view, and game 38 our “can we win this and cap it off” view.
Show the code
games <-tibble(game_id =seq(1:38)) %>%mutate(games_left =38- game_id +1,perfect_win_odds =dbinom(games_left, games_left, city_w_perc))games %>%ggplot(aes(x = game_id, y = perfect_win_odds))+geom_point(color ='#6CABDD')+theme_ipsum()+xlab('Matchweek')+ylab('Odds of Winning All Games')
Now this tells us something. We don’t cross the 10 percent barrier until week 32, but our odds get exponentially better* with each week. Remember our coin flip example from earlier? It went from 50% to 25% when we doubled the flips. In this case, we’re reducing the number of “flips” left, meaning the more they win the more likely we think it is they can win all matches.
*It follows a log odds pattern, if we put it on that scale it’d be a perfectly straight line, but one that doesn’t capture the story near as well.
Still, it’s not surprising that City’s odds of winning all their matches this year are so low. Guess they’ll just have to settle for winning the Premier League with some draws and losses scattered in.