Improving the Premier League Simulator

statistics
r
soccer
Author

Mark Jurries II

Published

January 11, 2024

I recently built a quick from-scratch Premier League season simulator. For the game simulator, I simply took the Squad’s average goals in the season and fed that into a Poisson model generator (rpois in R) and ran against the opposing squad’s simulated distribution. This is OK, but it doesn’t take into account a few things : home/away splits, strength of defense, or noise in the data.

Home/away is easy enough to grab. We have goals allowed. Noise can be reduced by using Expected Goals (xG) instead of Goals. My biggest challenge was how to capture the interaction between Goals and opposing defense. I eventually found a really helpful article that used the following methodology:

For example, let’s look at Man City’s match against Sheffield United. City went into the game with a home xG of 1.78, which is 1.03 from league average of 1.73. Sheffield United had an away xGA of 1.29, which of 1.05 from league average of 1.23. So we get 1.03 * 1.05 * 1.73, or 1.87 xG for City. It’s a relatively small adjustment, but worth the trouble.

Note these are done using 12 week exponential moving averages (another concept I took from the article linked above). This is in Premier League / Champions league only, and uses the most recent data to forecast the rest of the season. The idea being that we can capture change in team performance without giving up too much historical data. At some point it’d be nice to update the model to adjust the moving average after each simulated match, but I’m saving that for another day.

OK, I made all this effort, how well does it work? We’ll calculate each match’s Brier scores., using matchweeks 18 and 19 for testing. The total is simply the average of the Brier for Win, Draw, and Loss.

Show the code
library(gt)
library(gtExtras)
library(hrbrthemes)
library(patchwork) 
library(pracma)
library(plotly)
library(rvest)
library(tidyverse)
library(worldfootballR)

#https://beatthebookie.blog/2021/06/07/the-predictive-power-of-xg/

#set seed for reproducibility
set.seed(20231228)

#use this to for the season end year
#makes it easier to update stuff later
season_end <- 2024
ema_duration <- 12

#how many seasons we want to simulate
number_of_sims <- 20000

#rather than use a case statement over and over, just make this a function
calculate_points <- function(a, b){
  case_when(a > b ~ 3,
            a == b ~ 1,
            a > b ~ 0,
            TRUE ~ 0)
}

#fetch current table using the worldfootballr package
prem_2024_table <- fb_season_team_stats(country = "ENG", 
                                        gender = "M", 
                                        season_end_year = "2024", 
                                        tier = "1st", 
                                        stat_type = "league_table")

pl_table <- prem_2024_table %>%
  select(Squad, MP, Pts, W, D, L, GF, GA, GD, xG, xGA, Rk) %>%
  rename(Points = Pts,
         Goals = GF,
         Goals_Allowed = GA,
         Rank = Rk,
         Goal_Dif = GD,
         xG_Allowed = xGA)

#get match results for current season and prior season
#also grab champions league so we have something for clubs that got promoted
match_pl_cs <- fb_match_results(country = "ENG", 
                 gender = "M",
                 season_end_year = season_end,
                 tier = "1st")

match_pl_ps <- fb_match_results(country = "ENG", 
                       gender = "M",
                       season_end_year = season_end - 1,
                       tier = "1st")

match_efl_ps <- fb_match_results(country = "ENG", 
                       gender = "M",
                       season_end_year = season_end - 1,
                       tier = "2nd")

#set aside unplayed matches for later
pl_future_matches <- match_pl_cs %>%
  filter(is.na(HomeGoals) | Notes == 'Match Suspended') %>%
  select(-MatchURL) %>%
  as_tibble()

#get max date with a game
Max_Date <- match_pl_cs  %>%
  filter(!is.na(HomeGoals)) %>%
  summarise(Max_Date = max(Date)) %>%
  pull()

#get home/away goals for the league using moving average
leaguge_exp <- match_pl_ps %>%
  rbind(match_pl_cs) %>%
  rbind(match_efl_ps) %>%
  mutate(Wk = as.numeric(Wk)) %>%
  group_by(Competition_Name, Season_End_Year, Wk) %>%
  summarise(mean_Home_xG = mean(Home_xG, na.rm = TRUE),
            mean_Away_xG = mean(Away_xG, na.rm = TRUE),
            d = min(Date)) %>%
  arrange(d) %>%
  group_by(Competition_Name) %>%
  mutate(lg_ema_Home_xG = movavg(mean_Home_xG, n = ema_duration, type = 'e'),
         lg_ema_Away_xG = movavg(mean_Away_xG, n = ema_duration, type = 'e')
         ) %>%
  select(-mean_Home_xG, -mean_Away_xG, -d)

#get club expected goals relative to league average for both home and away
#using a expoential moving average for each, i.e. compared to league over 
#last n matches, how good/bad are they at scoring and preventing goals
club_xg_ema <- match_pl_ps %>%
  rbind(match_pl_cs) %>%
  rbind(match_efl_ps) %>%
  mutate(Wk = as.numeric(Wk)) %>%
  select(Competition_Name, Season_End_Year, Wk, Home, Away, Home_xG, Away_xG) %>%
  gather(key = Location, value = Squad, Home, Away) %>%
  filter(!is.na(Home_xG)) %>%
  filter(!(Season_End_Year == 2024 & Wk == 20)) %>%
  mutate(xG =  case_when(Location == 'Home' ~ Home_xG, TRUE ~ Away_xG),
         xGA =  case_when(Location == 'Home' ~ Away_xG , TRUE ~ Home_xG)) %>%
  select(-Home_xG, -Away_xG) %>%
  arrange(Squad, Season_End_Year, Wk) %>%
  group_by(Squad, Location) %>%
  mutate(ema_xG = movavg(xG, n = ema_duration, type = 'e'),
         ema_xGA = movavg(xGA, n = ema_duration, type = 'e')) %>%
  inner_join(leaguge_exp) %>%
  mutate(rel_xG = case_when(Location == 'Home' ~ ema_xG / lg_ema_Home_xG,
                            TRUE ~ ema_xG / lg_ema_Away_xG),
         rel_xGa = case_when(Location == 'Home' ~ ema_xGA / lg_ema_Away_xG,
                            TRUE ~ ema_xGA / lg_ema_Home_xG)) %>%
  group_by(Squad, Location) %>%
  filter(Season_End_Year == season_end) %>%
  filter(Wk == max(Wk)) %>%
  select(Wk, Squad, lg_ema_Home_xG, lg_ema_Away_xG, rel_xG, rel_xGa)

sim_games3 <- function(home, away){
  n_sims <- number_of_sims
  
  home_table <- club_xg_ema %>%
    filter(Squad == home & Location == 'Home')
  
  away_table <- club_xg_ema %>%
    filter(Squad == away & Location == 'Away')
  
  home_exp <- (home_table$rel_xG  * away_table$rel_xGa) * home_table$lg_ema_Home_xG
  away_exp <- (away_table$rel_xG  * home_table$rel_xGa) * away_table$lg_ema_Home_xG
  
  home_goals <- rpois(n_sims, home_exp)
  away_goals<- rpois(n_sims, away_exp)
  
  results <- cbind(home_goals, away_goals) %>%
    as_tibble %>%
    mutate(goal_dif = home_goals - away_goals,
           home_points = calculate_points(home_goals, away_goals),
           away_points = calculate_points(away_goals, home_goals),
           sim_num = row_number()) %>%
    relocate(sim_num)
  
  results  
}

#old version of simmer just in case
sim_games <- function(home_goals, away_goals){
  n_sims <- number_of_sims
  home_goals <- rpois(n_sims, home_goals)
  away_goals<- rpois(n_sims, away_goals)
  
  results <- cbind(home_goals, away_goals) %>%
    as_tibble %>%
    mutate(goal_dif = home_goals - away_goals,
           home_points = calculate_points(home_goals, away_goals),
           away_points = calculate_points(away_goals, home_goals),
           sim_num = row_number()) %>%
    relocate(sim_num)
  
  results  
}

sim_new <- match_pl_cs %>%
  filter(Wk %in% c(18, 19) & !is.na(HomeGoals)) %>%
  select(Wk, Date, Home, Away, HomeGoals, AwayGoals) %>%
  inner_join(pl_table %>% select(Squad, Goals, xG, MP) %>% rename(Home_Goals = Goals, Home_xG = xG, Home_MP = MP), join_by(Home == Squad)) %>%
  inner_join(pl_table %>% select(Squad, Goals, xG, MP) %>% rename(Away_Goals = Goals, Away_xG = xG, Away_MP = MP), join_by(Away == Squad)) %>%
  mutate(Home_G_per_Match = Home_Goals / Home_MP,
         Away_G_per_Match = Away_Goals / Away_MP) %>%
  mutate(simmed_games = map2(Home, Away, ~sim_games3(.x, .y))) %>%
  unnest(simmed_games)

sim_agg_with_brier <- sim_new %>%
  mutate(W = case_when(HomeGoals > AwayGoals ~ 1, TRUE ~ 0),
         D = case_when(HomeGoals == AwayGoals ~ 1, TRUE ~ 0),
         L = case_when(HomeGoals < AwayGoals ~ 1, TRUE ~ 0)) %>%
  mutate(W_sim = case_when(home_goals > away_goals ~ 1, TRUE ~ 0),
         D_sim = case_when(home_goals == away_goals ~ 1, TRUE ~ 0),
         L_sim = case_when(home_goals < away_goals ~ 1, TRUE ~ 0)) %>%
  group_by(Wk, Home, Away) %>%
  summarise(W = mean(W),
            D = mean(D),
            L = mean(L),
            HomeGoals = mean(HomeGoals),
            AwayGoals = mean(AwayGoals),
            W_sim = mean(W_sim),
            D_sim = mean(D_sim),
            L_sim = mean(L_sim)) %>%
  mutate(Brier_W = case_when(W == 1 ~ (W_sim - 1)^2, TRUE ~ W_sim ^ 2),
         Brier_D = case_when(D == 1 ~ (D_sim - 1)^2, TRUE ~ D_sim ^ 2),
         Brier_L = case_when(L == 1 ~ (L_sim - 1)^2, TRUE ~ L_sim ^ 2),
         Brier_T = (Brier_W + Brier_D + Brier_L) / 3
  )

bw <- sim_agg_with_brier %>%
  ggplot(aes(x = Brier_W))+
  geom_density()+
  geom_rug()+
  theme_ipsum()+
  labs(title = 'Brier Win')+
  xlim(0, 0.85)

bd <- sim_agg_with_brier %>%
  ggplot(aes(x = Brier_D))+
  geom_density()+
  geom_rug()+
  theme_ipsum()+
  labs(title = 'Brier Draw')+
  xlim(0, 0.85)

bl <- sim_agg_with_brier %>%
  ggplot(aes(x = Brier_L))+
  geom_density()+
  geom_rug()+
  theme_ipsum()+
  labs(title = 'Brier Loss')+
  xlim(0, 0.85)

bt <- sim_agg_with_brier %>%
  ggplot(aes(x = Brier_T))+
  geom_density()+
  geom_rug()+
  theme_ipsum()+
  labs(title = 'Brier Total')+
  xlim(0, 0.85)

(bw | bd) / (bl | bt)

This isn’t bad - the average sits around 0.35, so we’re better than a coin toss. It’s doing suspiciously well on draws, while win and loss only look OK. Let’s compare our average prediction for each outcome with the average outcomes overall.

Show the code
sim_agg_with_brier %>%
  ungroup() %>%
  summarise_if(is.numeric, mean, na.rm = TRUE) %>%
  select(-HomeGoals, -AwayGoals,) %>%
  pivot_longer(everything(), names_to = 'type', values_to = 'value') %>%
  mutate(Class = case_when(grepl('_sim', type) ~ 'Sim', 
                           grepl('Brier_', type) ~ 'Brier',
                           TRUE ~ 'Actual'),
         type = str_replace(type, '_sim', ''),
         type = str_replace(type, 'Brier_', ''),
         type = ifelse(type == 'T', 'Total', type)) %>%
  pivot_wider(names_from = 'Class', values_from = 'value') %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_percent(columns = c(Actual, Sim), decimals = 1) %>%
  fmt_number(columns = c(Brier), decimals = 2) %>%
  sub_missing(columns = everything(),
              rows = everything(),
              missing_text = "-")
type Actual Sim Brier
W 42.1% 37.8% 0.31
D 15.8% 20.7% 0.13
L 42.1% 41.6% 0.29
Total - - 0.24

Not too shabby, with an overall Brier score of 0.24. We’d like it to be closer, it overestimated draws and underestimated wins in our test data. Where is it missing? Let’s check our top 10 Brier scores. Win/loss perspectives are from the home team’s perspective.

Show the code
sim_agg_with_brier %>%
  ungroup() %>%
  arrange(desc(Brier_T)) %>%
  top_n(10) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_percent(columns = c(W_sim, D_sim, L_sim), decimals = 1) %>%
  fmt_number(columns = c(Brier_W, Brier_D, Brier_L, Brier_T), decimals = 2) %>%
  cols_label(HomeGoals = 'Home Goals',
             AwayGoals = 'Away Goals')
Wk Home Away W D L Home Goals Away Goals W_sim D_sim L_sim Brier_W Brier_D Brier_L Brier_T
19 Arsenal West Ham 0 0 1 0 2 67.5% 17.1% 15.4% 0.46 0.03 0.72 0.40
18 Wolves Chelsea 1 0 0 2 1 16.2% 16.9% 66.9% 0.70 0.03 0.45 0.39
19 Newcastle Utd Nott'ham Forest 0 0 1 1 3 64.8% 17.0% 18.2% 0.42 0.03 0.67 0.37
18 Luton Town Newcastle Utd 1 0 0 1 0 18.3% 17.2% 64.5% 0.67 0.03 0.42 0.37
18 Aston Villa Sheffield Utd 0 1 0 1 1 63.1% 22.0% 14.8% 0.40 0.61 0.02 0.34
18 Crystal Palace Brighton 0 1 0 1 1 27.1% 23.2% 49.7% 0.07 0.59 0.25 0.30
18 Fulham Burnley 0 0 1 0 2 50.4% 24.9% 24.7% 0.25 0.06 0.57 0.29
18 Liverpool Arsenal 0 1 0 1 1 32.2% 25.1% 42.7% 0.10 0.56 0.18 0.28
18 Tottenham Everton 1 0 0 2 1 28.0% 19.2% 52.9% 0.52 0.04 0.28 0.28
19 Sheffield Utd Luton Town 0 0 1 2 3 48.5% 23.9% 27.6% 0.24 0.06 0.52 0.27

Our highest score comes from Arsenal vs. West Ham. The model pegged the Gunners with a 67.4% chance of winning and a 17.3% chance of losing, leaving West Ham with a mere 15.3% win probability. They won it 0-2, while the xG was 2.7 to 1.4 suggests it was a tough loss it was nonetheless a loss. (It was a great game to watch, too - both squads played very well.) I think we’d all have been suspicious of a model that heavily favored West Ham going in - if anything, 15.3% was too high a win probability for them - so we won’t worry about this match.

Going down the list, most of these are upsets. I like the Wolves and all, but wouldn’t pick them to beat Chelsea (even considering how lousy Chelsea’s been lately), yet here we go. Nottingham Forest* beating Newcastle was another great game to watch, and yet again an upset.

*Caveat: Nottingham is quickly becoming my favorite underdog squad. Just a lot of fun to watch.

What about the cases where the model did well?

Show the code
sim_agg_with_brier %>%
  ungroup() %>%
  arrange((Brier_T)) %>%
  top_n(10, -Brier_T) %>%
  gt() %>%
  gt_theme_espn() %>%
  fmt_percent(columns = c(W_sim, D_sim, L_sim), decimals = 1) %>%
  fmt_number(columns = c(Brier_W, Brier_D, Brier_L, Brier_T), decimals = 2) %>%
  cols_label(HomeGoals = 'Home Goals',
             AwayGoals = 'Away Goals')
Wk Home Away W D L Home Goals Away Goals W_sim D_sim L_sim Brier_W Brier_D Brier_L Brier_T
19 Everton Manchester City 0 0 1 1 3 11.6% 14.7% 73.8% 0.01 0.02 0.07 0.03
19 Burnley Liverpool 0 0 1 0 2 15.7% 19.7% 64.6% 0.02 0.04 0.13 0.06
18 Nott'ham Forest Bournemouth 0 0 1 2 3 17.5% 20.3% 62.2% 0.03 0.04 0.14 0.07
19 Bournemouth Fulham 1 0 0 3 0 54.2% 23.9% 21.8% 0.21 0.06 0.05 0.10
19 Chelsea Crystal Palace 1 0 0 2 1 47.1% 22.2% 30.6% 0.28 0.05 0.09 0.14
19 Brighton Tottenham 1 0 0 4 2 46.4% 19.0% 34.7% 0.29 0.04 0.12 0.15
19 Brentford Wolves 0 0 1 1 4 45.3% 21.0% 33.7% 0.21 0.04 0.44 0.23
18 West Ham Manchester Utd 1 0 0 2 0 32.8% 23.8% 43.4% 0.45 0.06 0.19 0.23
19 Manchester Utd Aston Villa 1 0 0 3 2 30.8% 21.4% 47.8% 0.48 0.05 0.23 0.25
19 Sheffield Utd Luton Town 0 0 1 2 3 48.5% 23.9% 27.6% 0.24 0.06 0.52 0.27

The model thought City had a 73.5% chance of beating Everton, and guess what City did? Most of these are really good clubs beating up weaker clubs, though there are also some mid-table clubs in there as well.

The chief thing here is that the model is being perhaps too conservative. City’s odds for winning against Everton were surely higher. (Yeah, the Brier for their win was 0.07, but if it’d given 80% odds the score would’ve been 0.04, a 43% improvement.) As a rule, the higher the odds, the lower the Brier score when you’re right. So that’s definitely something to look tweaking going forward. For now, it’ll do.

With this done, we can update the projected table for the rest of the season.

Show the code
pl_simmed <- pl_future_matches %>%
  select(Wk, Date, Home, Away) %>%
  inner_join(pl_table %>% select(Squad, Goals, xG, MP) %>% rename(Home_Goals = Goals, Home_xG = xG, Home_MP = MP), join_by(Home == Squad)) %>%
  inner_join(pl_table %>% select(Squad, Goals, xG, MP) %>% rename(Away_Goals = Goals, Away_xG = xG, Away_MP = MP), join_by(Away == Squad)) %>%
  mutate(Home_G_per_Match = Home_Goals / Home_MP,
         Away_G_per_Match = Away_Goals / Away_MP) %>%
  mutate(simmed_games = map2(Home, Away, ~sim_games3(.x, .y)))

#unnest our sims, split into home and away
pl_simmed_long <- pl_simmed %>%
  select(Wk, Date, Home, Away, simmed_games) %>%
  unnest(simmed_games)

pl_simmed_home <- pl_simmed_long %>%
  select(sim_num, Home, home_points, home_goals,  away_goals) %>%
  rename(Squad = Home,
         Goals = home_goals,
         Goals_Allowed = away_goals,
         Points = home_points
  ) %>%
  mutate(W = case_when(Goals > Goals_Allowed ~ 1, TRUE ~ 0),
         D = case_when(Goals == Goals_Allowed ~ 1, TRUE ~ 0),
         L = case_when(Goals < Goals_Allowed ~ 1, TRUE ~ 0))

pl_simmed_away <- pl_simmed_long %>%
  select(sim_num, Away, away_points, away_goals, home_goals) %>%
  rename(Squad = Away,
         Goals = away_goals,
         Goals_Allowed = home_goals,
         Points = away_points
  ) %>%
  mutate(W = case_when(Goals > Goals_Allowed ~ 1, TRUE ~ 0),
         D = case_when(Goals == Goals_Allowed ~ 1, TRUE ~ 0),
         L = case_when(Goals < Goals_Allowed ~ 1, TRUE ~ 0))

#we need actual in the data as well, so we repliate the table 
#to match our number of sims
table_rep <- seq(1:number_of_sims) %>%
  as_tibble() %>%
  rename(sim_num = value) %>%
  crossing(pl_table %>% select(Squad, Points, Goals, Goals_Allowed, W, D, L))

#take all of our sims and summarize. Note this includes the sim number if we want to play with that
pl_aggregated <- pl_simmed_home %>%
  rbind(pl_simmed_away) %>%
  rbind(table_rep) %>%
  group_by(sim_num, Squad) %>%
  summarise_if(is.numeric, sum, na.rm = TRUE) %>%
  mutate(Goal_Dif = Goals - Goals_Allowed) %>%
  group_by(sim_num) %>%
  arrange(desc(Points), desc(Goal_Dif)) %>%
  mutate(Rank = rank(desc(Points), ties.method="first"),
         win_PL = case_when(Rank == 1 ~ 1, TRUE ~ 0),
         qualify_UCL = case_when(Rank <= 4 ~ 1, TRUE ~ 0),
         relgated = case_when(Rank >= 18 ~ 1, TRUE ~ 0))

pl_team <- pl_aggregated %>%
  ungroup() %>%
  select(-sim_num, -Rank) %>%
  group_by(Squad) %>%
  summarise_if(is.numeric, mean, na.rm = TRUE) %>%
  arrange(desc(Points))

pl_sim_table <- pl_table %>%
  inner_join(pl_team, by = join_by(Squad)) %>%
  select(-xG, -xG_Allowed) %>%
  arrange(desc(win_PL), relgated) 

pl_sim_table %>%
  gt() %>%
  gt_theme_espn() %>%
  tab_spanner(
    label = "Current Table",
    columns = c("MP", "Points.x", "Goals.x", "Goals_Allowed.x", "Goal_Dif.x", "Rank", "W.x", "D.x", "L.x")
  ) %>%
  tab_spanner(
    label = "Projected",
    columns = c("Points.y", "Goals.y", "Goals_Allowed.y", "Goal_Dif.y", "win_PL", "qualify_UCL", "relgated", "W.y", "D.y", "L.y")
  ) %>%
  gt_add_divider(columns = "Rank", style = "solid", color = 'lightgrey') %>%
  data_color(
    columns = c("win_PL", "qualify_UCL", "relgated"),
    method = "numeric",
    palette = c('#00ff85', '#ffffff'),
    #omain = c(0.01, 1),
    na_color = "white",
    reverse = TRUE
  ) %>%
  fmt_percent(columns = c("win_PL", "qualify_UCL", "relgated"), decimals = 1) %>%
  fmt_number(columns = c("Points.y", "Goals.y", "Goals_Allowed.y", "Goal_Dif.y",
                         "W.y", "D.y", "L.y"), decimals = 1) %>%
  cols_width(win_PL ~ px(95),
             qualify_UCL ~ px(95),
             relgated ~ px(95)) %>%
  cols_label(
    Points.x = "Points",
    Goals.x = "G",
    Goals_Allowed.x = "GA",
    Goal_Dif.x = "GD",
    W.x = "W",
    D.x = "D",
    L.x = "L",
    Points.y = "Points",
    Goals.y = "G",
    Goals_Allowed.y = "GA",
    Goal_Dif.y = "GD",
    win_PL = 'Win PL',
    qualify_UCL = 'Qualify UCL',
    relgated = 'Relegated',
    W.y = "W",
    D.y = "D",
    L.y = "L",
  ) %>%
  tab_header(title = 'Premier League Projected Standings',
             subtitle = paste('Using data from matches played through ', Max_Date)) %>%
  tab_footnote('Table data courtesy of fbref.com') %>%
  tab_style(
    style = cell_text(align = "center"),
    locations = cells_column_labels()) %>%
  cols_hide(columns = c(W.x, D.x, L.x, W.y, D.y, L.y))
Premier League Projected Standings
Using data from matches played through 2024-01-02
Squad Current Table Projected
MP Points G GA GD Rank Points G GA GD Win PL Qualify UCL Relegated
Liverpool 20 45 43 18 25 1 79.4 81.7 41.0 40.6 43.7% 99.0% 0.0%
Manchester City 19 40 45 21 24 3 77.2 87.0 43.9 43.2 28.1% 97.3% 0.0%
Arsenal 20 40 37 20 17 4 77.0 75.4 38.7 36.7 24.1% 97.4% 0.0%
Aston Villa 20 42 43 27 16 2 70.9 75.1 52.7 22.3 4.0% 78.9% 0.0%
Tottenham 20 39 42 29 13 5 61.4 75.3 68.1 7.2 0.0% 10.1% 0.0%
Newcastle Utd 20 29 39 29 10 9 59.7 79.9 60.2 19.7 0.0% 7.2% 0.0%
Chelsea 20 28 34 31 3 10 57.9 73.1 61.9 11.3 0.0% 4.0% 0.0%
Brighton 20 31 38 33 5 7 57.3 72.9 66.1 6.8 0.0% 2.9% 0.0%
West Ham 20 34 33 30 3 6 56.3 61.4 63.3 −1.9 0.0% 1.8% 0.0%
Manchester Utd 20 31 22 27 -5 8 55.1 53.2 60.0 −6.8 0.0% 1.1% 0.0%
Bournemouth 19 25 28 35 -7 12 51.3 58.3 65.6 −7.3 0.0% 0.2% 0.2%
Brentford 19 19 26 31 -5 16 48.6 61.0 61.3 −0.3 0.0% 0.1% 0.5%
Wolves 20 28 30 31 -1 11 47.1 55.4 66.9 −11.5 0.0% 0.0% 0.9%
Crystal Palace 20 21 22 29 -7 14 44.1 48.6 58.8 −10.3 0.0% 0.0% 3.7%
Fulham 20 24 28 35 -7 13 43.7 52.5 68.3 −15.8 0.0% 0.0% 4.5%
Everton 20 16 24 28 -4 17 39.9 53.6 59.6 −6.0 0.0% 0.0% 13.5%
Nott'ham Forest 20 20 24 35 -11 15 40.1 51.3 71.3 −20.0 0.0% 0.0% 14.2%
Burnley 20 11 20 41 -21 19 30.5 40.8 69.8 −29.1 0.0% 0.0% 82.9%
Luton Town 19 15 23 37 -14 18 30.7 43.3 75.8 −32.5 0.0% 0.0% 83.8%
Sheffield Utd 20 9 15 49 -34 20 26.5 36.4 82.8 −46.4 0.0% 0.0% 95.7%
Table data courtesy of fbref.com

Liverpool is sitting pretty nice right now, with City and Arsenal in contention. Liverpool also just beat Newcastle 4-2, with an xG line of 7-0.6 (!), so their numbers are a bit inflated right now.

The takeaway here is that bootstrapping is relatively trivial, but making sure that what you’re bootstrapping is solid is a lot more difficult. Soccer has a fair amount on unpredictability which should adjust our expectations, but that doesn’t mean there’s no room for improvement.