One of the interesting things about the Premier League compared to something like MLB is that the bottom three teams get relegated every year, which means that each season there are three new teams that have been promoted into the League. With so much change, I couldn’t help but wonder how this impacts the distribution of points, goals, etc., especially with the big six dominating.
Thankfully, the worldfootballr package makes it fairly easy to get the data we need.
Now that we’ve got this, we can check the spread of points over time. We’ll use the standard deviation, which may not be the perfect tool for this but should give a pretty good idea of the spread. The larger the standard deviation, the more closely bunched teams are. As a sanity check, we’ll also plot each team alongside this.
Since there are only 20 teams, we can plot them all without risking information overload. Since we’re more interested in the distribution than in seeing how individual teams performed, we don’t have to worry about too many colors, points, etc.
Show the code
points_sd_plot <- pl_summarized %>%group_by(Season_End_Year) %>%summarise(points_sd =sd(Points)) %>%ggplot(aes(x = Season_End_Year, y = points_sd, group =1, text =paste('Season:', Season_End_Year, "<br>Points SD:", round(points_sd, digits =2))))+geom_line(color ='#20CAFF')+ylim(5, 22)+ggtitle('Points SD')+theme_ipsum()points_sd_plotly <-ggplotly(points_sd_plot, tooltip ='text')points_team_plot <- pl_summarized %>%ggplot(aes(x = Season_End_Year, y = Points, group =1,text =paste(Team, '<br>Season:', Season_End_Year, "<br>Points:", Points)))+geom_point(color ='#20CAFF')+ylim(20, 105)+ggtitle('Points by Team')+theme_ipsum()points_team_plotly <-ggplotly(points_team_plot, tooltip ='text')subplot(points_sd_plotly, points_team_plotly)
All things considered, PL is slightly less competitive now than it was in the early 2000s, though the recent dominance of Liverpool and Manchester City is certainly impacting the numbers. But the bottom of the league is suffering as well - 3 teams finished with less than 30 points in 2019-2020. What about goals scored and allowed?
Show the code
gscored_sd_plot <- pl_summarized %>%group_by(Season_End_Year) %>%summarise(Goals_sd =sd(Goals),GA_sd =sd(GA),g_dif_sd =sd(g_dif)) %>%ggplot(aes(x = Season_End_Year, y = Goals_sd, group =1, text =paste('Season:', Season_End_Year, "<br>Goals Scored SD:", round( Goals_sd, digits =2))))+geom_line(color ='#37003C')+ylim(5, 22)+ggtitle('Goals Scored SD')+theme_ipsum()gscored_sd_plotly <-ggplotly(gscored_sd_plot, tooltip ='text')gscored_team_plot <- pl_summarized %>%ggplot(aes(x = Season_End_Year, y = Goals, group =1,text =paste(Team, '<br>Season:', Season_End_Year, "<br>Goals Scored:", Goals)))+geom_point(color ='#37003C')+ylim(20, 105)+ggtitle('Goals Scored by Team')+theme_ipsum()gscored_team_plotly <-ggplotly(gscored_team_plot, tooltip ='text')subplot(gscored_sd_plotly, gscored_team_plot)
Show the code
gallowed_sd_plot <- pl_summarized %>%group_by(Season_End_Year) %>%summarise(Goals_sd =sd(Goals),GA_sd =sd(GA),g_dif_sd =sd(g_dif)) %>%ggplot(aes(x = Season_End_Year, y = GA_sd, group =1, text =paste('Season:', Season_End_Year, "<br>Goals Allowed SD:", round(GA_sd, digits =2))))+geom_line(color ='#20CAFF')+ylim(5, 22)+ggtitle('Goals Allowed SD')+theme_ipsum()gallowed_sd_plotly <-ggplotly(gallowed_sd_plot, tooltip ='text')gallowed_team_plot <- pl_summarized %>%ggplot(aes(x = Season_End_Year, y = GA, group =1,text =paste(Team, '<br>Season:', Season_End_Year, "<br>Goals Allowed:", GA)))+geom_point(color ='#20CAFF')+ylim(20, 105)+ggtitle('Goals Allowed by Team')+theme_ipsum()gallowed_team_plotly <-ggplotly(gallowed_team_plot, tooltip ='text')subplot(gallowed_sd_plotly, gallowed_team_plotly)
Show the code
gdif_sd_plot <- pl_summarized %>%group_by(Season_End_Year) %>%summarise(Goals_sd =sd(Goals),GA_sd =sd(GA),g_dif_sd =sd(g_dif)) %>%ggplot(aes(x = Season_End_Year, y = g_dif_sd, group =1, text =paste('Season:', Season_End_Year, "<br>Goal Dif SD:", round( g_dif_sd, digits =2))))+geom_line()+ggtitle('Goal Dif SD')+theme_ipsum()gdif_sd_plotly <-ggplotly(gdif_sd_plot, tooltip ='text')gdif_team_plot <- pl_summarized %>%ggplot(aes(x = Season_End_Year, y = g_dif, group =1,text =paste(Team, "<br>Season:", Season_End_Year, "<br>Goal Dif:", round(g_dif, digits =2))))+geom_point(aes(color = g_dif))+scale_color_gradient2(low ='#20CAFF', mid ='#00FD82', high ='#37003C',midpoint =0.0, guide ='none' )+ggtitle('Goal Dif by Team')+theme_ipsum()gdif_team_plotly <-ggplotly(gdif_team_plot, tooltip ='text')subplot(gdif_sd_plotly, gdif_team_plotly)
Goal differential tells an interesting story. We can see that spread has been increasing fairly regularly since 2010-2011, with a few bumps along the way. Liverpool and Man City again dominate the top, and poor Norwich City lags at the bottom in both 2019-2020 and 2021-2022. Goals allowed have stayed relatively the same over time, while goals allowed have gone up, driving the variance in goal differential.
Finally, since the Premier League has a balanced schedule with each team facing every other team two times, we can see how the different tiers have performed against each other over time. With 20 teams, we’ll split into five groups based on point rank - 1 to 4, 5 to 8, etc. The teams in the cohort will change year to year, so Arsenal may be in top 4 one year and in 5-8 the next. The first row shows how teams ranked 1-4 did on average against the other cohorts, the second row shows how 5-8 did, etc. Since the lower tiers have lower numbers, we’ll give each tier its own axis scale so we can better see changes over time.
* Well there we go. The top tier are getting more points on average against the rest of the league, while the bottom four are trending in the opposite direction. Bad news if your team just got promoted*, but pretty good if you’re a supporter of one of the top clubs.
Looking at you, Nottingham Forrest.
Whether this all is good or bad comes down to taste. Nobody really believes every team has an equal shot at the title - or even getting into the Champion’s League - at the start of each season, so perhaps it’s more entertaining to watch the upper tiers dominate. But it’s also fun to watch Leeds beat Liverpool, and as long as we continue to get upsets like that it’ll always be worth watching.