Players vs Teams

probability
soccer
baseball
r
Author

Mark Jurries II

Published

August 25, 2022

Sports can provide some interesting numbers. With the advent of both high-speed camera tracking and big data availability, getting information that provides insight into what really impacts performance and how to predict it is more available now than ever.

But we’re not here to talk about that. We’re going to be looking instead at a junk stat, one that’s interesting but doesn’t have any predictive power. Namely, players who outperform teams. For instance, as of this writing, Aaron Judge has 48 home runs, while the Detroit Tigers have 72. The next lowest team total is 96, and if Judge’s numbers were added to Detroit’s they’d go from 30th to 18th in the rankings. Something about a single player doing something more than an entire group of players who theoretically have the same skillset is fascinating.

With that sort of mindset, I was struck the other week when I saw somebody suggest - tongue in cheek, I think - that Erling Haaland could score 50 goals* this season. Naturally, that’s a lot of goals, and would probably be more than several squads. Which makes one wonder - how often has that happened in Premier League history? Would it be novel, or a rather ho-hum part of life?

*Still 50 more than Man U scored against Brentford. Hey-o.

Using the WorldFootballR package, I downloaded all player goal stats from fbref.com. With this in hand, we can summarize the top goal-scoring player seasons in the PL.

From here, all we need to do is summarize goals by squad and season, then, for each player-season, see if they had more goals than the squad, which we can them summarize by player. Note we’re excluding own-goals from this. It turns out that this has happened in 29 player seasons of the 7,883 player-seasons where the player scored at least 1 goal, or 0.03%. So yes, it is rare. Players to do it are:

Show the code
library(tidyverse)
library(hrbrthemes)
library(ggridges)
library(gt)
library(ggtext)

pl_player_stats <- read.csv('pl_player_stats.csv') %>%
  select(-X)

pl_squad_goals <- pl_player_stats %>%
  group_by(Season, Squad) %>%
  summarise(Squad_Goals = sum(player_goals))

player_better_than_squads <- pl_player_stats %>%
  filter(player_goals > 0) %>%
  rename(Goals = player_goals) %>%
  inner_join(pl_squad_goals, by = c('Season')) %>%
  filter(Goals > Squad_Goals) %>%
  rename(Player_Squad = Squad.x, Squad_Beat = Squad.y) %>%
  mutate(dif = Goals - Squad_Goals) %>%
  arrange(Season)

player_better_than_squads %>%
  group_by(Player) %>%
  summarise(seasons = n_distinct(Season),
            teams_beat = n(),
            goal_dif = sum(dif),
            min_season = min(Season),
            max_season = max(Season)) %>%
  arrange(desc(seasons), desc(goal_dif)) %>%
  gt() %>%
  tab_style(
    locations = cells_column_labels(columns = everything()),
      style = list(
        cell_text(weight = "bold")
      )
    )
Player seasons teams_beat goal_dif min_season max_season
Mohamed Salah 4 6 20 2017-2018 2021-2022
Harry Kane 4 6 17 2015-2016 2020-2021
Thierry Henry 2 3 7 2002-2003 2005-2006
Cristiano Ronaldo 1 1 12 2007-2008 2007-2008
Alan Shearer 1 2 5 1994-1995 1994-1995
Emmanuel Adebayor 1 1 5 2007-2008 2007-2008
Fernando Torres 1 1 5 2007-2008 2007-2008
Ruud van Nistelrooy 1 1 4 2002-2003 2002-2003
Luis Suárez 1 1 3 2013-2014 2013-2014
James Beattie 1 1 2 2002-2003 2002-2003
Pierre-Emerick Aubameyang 1 1 2 2018-2019 2018-2019
Sadio Mané 1 1 2 2018-2019 2018-2019
Son Heung-min 1 1 2 2021-2022 2021-2022
Andy Cole 1 1 1 1993-1994 1993-1994
Didier Drogba 1 1 1 2009-2010 2009-2010
Sergio Agüero 1 1 1 2018-2019 2018-2019

Mo Salah and Harry Kane* don’t need this to let people know they’re good, but nonetheless, here we are. Ronaldo’s only done it once, scoring 31 goals in 2007-2008 while Derby County only managed 19. They were also beat by Adebayor and Torres that season, though they are not the team beat by the most players in a single season - that honor goes to 2018-2019 Huddersfield Town, whose 20 goals were beat by Salah (22), Aubameyang (22), Mané (22), and Agüero (21).

*“Salah and Kane are good and you should get them.” #advancedanalytics

It’s outside the scope of this*, but it may be that the reason we’re seeing more players beating squads lately correlates to the top-heaviness of the Premier League, with poorer-quality clubs getting promoted, beat soundly, then relegated for more cannon fodder the next season.

*Read: “I’m too tired to look at this right now”

So we solved soccer, but can we do this for baseball, you ask? Yes, I answer my own leading question, though we’ll focus on home runs instead of runs - a run scored can be the product of having good hitting behind you, an RBI can be because of good hitting in front of you (leadoff men always have lower totals), while a HR is all in the hitter’s control.

Show the code
library(Lahman)

mlb_teams <- Lahman::Teams %>%
  as_tibble() %>%
  select(yearID, teamID, franchID, HR)

mlb_master <- Lahman::People %>% 
  as_tibble() %>%
  select(playerID, nameFirst, nameLast)

mlb_players <- Lahman::Batting %>%
  as_tibble() %>%
  group_by(playerID, yearID) %>%
  summarise(player_HR = sum(HR)) %>%
  filter(player_HR > 0) %>%
  inner_join(mlb_master)

mlb_players_v_teams <- mlb_players %>%
  inner_join(mlb_teams, by = c("yearID")) %>%
  filter(player_HR > HR) %>%
  arrange(desc(player_HR))

mlb_players_v_teams %>%
  filter(player_HR >= 11) %>%
  mutate(name = paste0(nameFirst, ' ', nameLast),
         dif = player_HR - HR) %>%
  group_by(name) %>%
  summarise(seasons = n_distinct(yearID),
            teams_beat = n(),
            hr_dif = sum(dif),
            min_year = min(yearID),
            max_year = max(yearID)) %>%
  arrange(desc(seasons)) %>%
  head(10) %>%
  gt() %>%
  tab_style(
    locations = cells_column_labels(columns = everything()),
    style = list(
      cell_text(weight = "bold")
    )
  )
name seasons teams_beat hr_dif min_year max_year
Babe Ruth 14 90 1287 1918 1932
Jimmie Foxx 7 13 115 1929 1938
Cy Williams 6 11 45 1915 1927
Hack Wilson 5 11 53 1926 1930
Harry Stovey 5 15 84 1883 1891
Lou Gehrig 5 18 207 1927 1936
Mel Ott 5 9 56 1929 1944
Rogers Hornsby 5 10 53 1921 1929
Gavvy Cravath 4 27 138 1913 1917
Hank Greenberg 4 6 40 1935 1946

I cut this list back a bit - 521 players have done it, but much of that came in the early era of baseball, when 1 home run was enough to exceed another team’s total. This is limited to seasons with at least 11 home runs more than the other team. Note Babe Ruth on the top with 14 seasons doing this, double Jimmie Foxx’s second-place tally of 7 and all in a row. His teammate Lou Gehrig also cracks the list, so those Yankees were truly frightening to face.

Let’s also note that Hank Greenberg was a Tiger, and a really good one at that.

Of course, baseball’s changed - the observant reader will note that most of these seasons all over 70 years ago. The last time this was done was in 1949, when Ralph Kiner (54) and Ted Williams (43) both outhomered the White Sox (43). Home runs have kept going up, so even Barry Bond’s record-setting 73 in 2001 still lagged league-worse Tampa Bay’s 121.

Regardless of whether baseball ever changes to a point where a single player can threaten to out-homer an entire team, or the Premier League changes to make out-scoring another squad unlikely is an interesting question, but end of the day, the achievements are certainly fun to watch.