MLB Stint Duration Over Time

statistics
r
baseball
Author

Mark Jurries II

Published

December 7, 2022

It’s currently the offseason for MLB, which means a lot of players changing teams, either through free agency or trade. It sometimes feels like players spend less time with their teams then they used to, and thankfully the data to check this is readily available.

We’ll be measuring player “stints” - that is, consecutive years with one team. As an example, we’ll look at Curtis Granderson:

*More precisely, I’m not interested. Sorry!
Show the code
library(baseballr)
library(hrbrthemes)
library(gt)
library(Lahman)
library(plotly)
library(tidymodels)
library(tidyverse)
library(tidyquant)
library(zoo)

teams <- Lahman::Teams %>%
  as_tibble()

batting <- Lahman::Batting %>% 
  as_tibble %>%
  left_join(teams %>% select(yearID, teamID, franchID)) %>%
  rowwise() %>%
  mutate(PA = sum(AB, BB, HBP, SH, SF, na.rm = TRUE)) %>%
  select(playerID, yearID, stint, franchID, G, PA) %>%
  rename(bg = G)

pitching <- Lahman::Pitching %>% 
  as_tibble %>%
  left_join(teams %>% select(yearID, teamID, franchID)) %>%
  select(playerID, yearID, stint, franchID, G, IPouts) %>%
  rename(pg = G)

players_all <- batting %>%
  union_all(pitching) %>%
  group_by(playerID, yearID, stint, franchID) %>%
  summarise_all(~ sum(., na.rm = TRUE)) %>%
  mutate(type = case_when(PA > IPouts ~ 'Batter', TRUE ~ 'Pitcher')) %>%
  mutate(G = case_when(type == 'Batter' ~ bg, TRUE ~ pg),
         wgt = case_when(type == 'Batter' ~ PA, TRUE ~ IPouts))

player_info <- Lahman::People %>%
  as_tibble() %>%
  select(playerID, nameFirst, nameLast) %>%
  unite(name, nameFirst, nameLast, sep = " ")

player_stint_data <- players_all %>%
  #filter(playerID == 'grandcu01') %>%
  group_by(playerID) %>%
  mutate(prior_franchID = lag(franchID, n = 1, order_by = playerID),
         prior_yearID = lag(yearID, n = 1, order_by = playerID),
         is_same_team = (franchID == prior_franchID) & (yearID == prior_yearID+1),
         stint_start = case_when(is.na(is_same_team) ~ 0, 
                                   is_same_team == FALSE ~ 0,
                                   TRUE ~ 1),
         stint_start2 = replace(stint_start, stint_start == 0, NA)
         ) %>%
  mutate(player_stint_id = cumsum(is.na(stint_start2))) %>% 
  group_by(playerID, player_stint_id) %>%
  mutate(stint_seasons_played = cumsum(stint_start) + 1) %>% 
  select(-prior_franchID, -prior_yearID, -is_same_team, -stint_start, -stint_start2) %>%
  mutate(stint_seaons_left = max(stint_seasons_played) - (stint_seasons_played - 1)) %>%
  left_join(player_info)

#to be used lated for player info
player_stint_summarised <- player_stint_data %>%
  group_by(playerID, name, franchID, player_stint_id, type) %>%
  summarise(start_year = min(yearID),
         end_year = max(yearID),
         length = max(stint_seasons_played)) %>%
  arrange(desc(length))

#to be used later for team info
active_frachises <- teams %>%
  filter(yearID == 2021) %>%
  mutate(lgdivID = paste(lgID, divID, teamID)) %>%
  select(franchID, lgID, teamID, divID, lgdivID) %>%
  arrange(lgdivID)
Show the code
player_stint_data %>%
  filter(playerID == 'grandcu01') %>%
  ungroup() %>%
  select(player_stint_id, name, yearID, franchID, stint_seasons_played, PA) %>%
  gt() %>%
  tab_style(
    locations = cells_column_labels(columns = everything()),
    style = list(
      cell_text(weight = "bold")
      )
    )
player_stint_id name yearID franchID stint_seasons_played PA
1 Curtis Granderson 2004 DET 1 28
1 Curtis Granderson 2005 DET 2 174
1 Curtis Granderson 2006 DET 3 679
1 Curtis Granderson 2007 DET 4 676
1 Curtis Granderson 2008 DET 5 629
1 Curtis Granderson 2009 DET 6 710
2 Curtis Granderson 2010 NYY 1 528
2 Curtis Granderson 2011 NYY 2 691
2 Curtis Granderson 2012 NYY 3 684
2 Curtis Granderson 2013 NYY 4 245
3 Curtis Granderson 2014 NYM 1 654
3 Curtis Granderson 2015 NYM 2 682
3 Curtis Granderson 2016 NYM 3 633
3 Curtis Granderson 2017 NYM 4 395
4 Curtis Granderson 2017 LAD 1 132
5 Curtis Granderson 2018 TOR 1 349
6 Curtis Granderson 2018 MIL 1 54
7 Curtis Granderson 2019 FLA 1 363

Grandy had 7 stints in total - 6 years with the Tigers, 4 with the Yankees, 4 with the Mets, and 1 year each with the Dodgers, Blue Jays, Brewers, and Marlins. He split time between teams in two seasons, so for both league and team summaries we’ll weigh the average stint duration by plate appearances for hitters and innings pitched for pitchers.

Show the code
league_plot <- player_stint_data %>%
  group_by(yearID, type) %>%
  summarise(mean_stint_seaons_left = mean(stint_seaons_left),
            mean_stint_seasons_played = mean(stint_seasons_played),
            w_mean_stint_seaons_left = weighted.mean(stint_seaons_left, wgt),
            w_mean_stint_seaons_played = weighted.mean(stint_seasons_played, wgt)
  ) %>%
  ggplot(aes(x = yearID, y = w_mean_stint_seaons_played, color = type, 
             text = paste("<br>Year:", yearID,
                          "<br>Player Type:", type,
                          "<br>Weighted Stint Duration:", round(w_mean_stint_seaons_played,3))))+
  geom_line(group = 1)+
  theme_ipsum()+
  ylab('Weighted Average Stint Duration')+
  xlab('Season')+
  scale_color_manual(values = c('#002D72', '#D50032'))

ggplotly(league_plot, tooltip = 'text')

Some interesting stories emerge here. Firstly, it’s assuring to see the low numbers in the 1880s - if those had been higher my formulas would be broken, so this passes the sniff test. Secondly, we see the effect of WWII in the 1940s, when many players went overseas to fight. We see a peak in the 60s, peaking in 1968 when Mickey Mantle’s 18-year run with the Yankees came to an end, along with Roy Face’s 14 years with the Pirates and Vada Pinson’s 11 with the Reds.

Since the 90s, batters have tended to stay with a team about 3.3 years or so, while pitchers are around 2.8. However; the gap has grown some as teams have turned more to the bullpen, where pitchers tend to have shorter tenures.

The story is even more interesting if we break it down by team. We’ll just look at the 30 active teams just to keep things clean.

Show the code
team_plot <- player_stint_data %>%
  inner_join(active_frachises) %>%
  group_by(yearID, type, lgdivID, franchID, lgID, divID) %>%
  summarise(mean_stint_seaons_left = mean(stint_seaons_left),
            mean_stint_seasons_played = mean(stint_seasons_played),
            w_mean_stint_seaons_left = weighted.mean(stint_seaons_left, wgt),
            w_mean_stint_seaons_played = weighted.mean(stint_seasons_played, wgt)
  ) %>%
  ggplot(aes(x = yearID, y = w_mean_stint_seaons_played, color = type,
             text = paste("Team:", franchID,
                          "<br>Current League/Division", lgID, divID,
                          "<br>Year:", yearID,
                          "<br>Player Type", type,
                          "<br>Weighted Stint Duration:", round(w_mean_stint_seaons_played,3))))+
  geom_line(group = 1)+
  theme_ipsum()+
  ylab('Weighted Average Stint Duration')+
  xlab('Season')+
  scale_color_manual(values = c('#002D72', '#D50032'))+
  facet_wrap(lgdivID ~ ., ncol = 5)+
  theme(legend.position = "none")

ggplotly(team_plot, tooltip = 'text')

The 70s Tigers had some guys stick around - Al Kaline ended a 22 year career, while Norm Cash, Mickey Stanley and Willie Horton each finished 15 years. We see several Yankee dynasties, while Cincinnati’s Big Red Machine jumps out in the 70s.

For kicks and giggles, I took a look at a team’s winning percentage against average stint duration.

Show the code
against_win_perc <- player_stint_data %>%
  group_by(yearID, franchID) %>%
  summarise(mean_stint_seaons_left = mean(stint_seaons_left),
            mean_stint_seasons_played = mean(stint_seasons_played),
            w_mean_stint_seaons_left = weighted.mean(stint_seaons_left, wgt),
            w_mean_stint_seaons_played = weighted.mean(stint_seasons_played, wgt)) %>%
  filter(yearID >= 1920) %>%
  inner_join(teams %>%
               mutate(win_perc = W / (W + L)) %>%
               select(yearID, franchID, win_perc, LgWin, WSWin))

team_corr <- against_win_perc %>%
  ggplot(aes(x = win_perc, y = w_mean_stint_seaons_played))+
  geom_point(alpha = 0.3, 
             color = '#40bf40',
             aes(text = paste("Team:", franchID,
                              "<br>Year:", yearID,
                              "<br>Win Percentage:", round(win_perc, 3),
                              "<br>Weighted Stint Duration:", round(w_mean_stint_seaons_played,3))))+
  geom_smooth(method = 'lm', se = FALSE, color = 'grey', linetype = 'dashed')+
  theme_ipsum()+
  ylab('Weighted Average Stint Duration')+
  xlab('Win Percentage')

ggplotly(team_corr, tooltip = 'text', height = 650, width = 650)

This looks interesting - and the r-squared is .18, so it’s certainly not nothing. But the conclusion should not be that “teams that have played together longer perform better”, rather, it’s “good players tend to stick around, and teams that have good players win more games”.

Finally, just for fun, here are the longest stints for each currently active team.

Show the code
player_stint_summarised %>%
  inner_join(active_frachises) %>%
  group_by(franchID) %>%
  mutate(longest_stint = max(length)) %>%
  filter(longest_stint == length) %>%
  ungroup() %>%
  select(franchID, name, start_year, end_year, length) %>%
  arrange(franchID) %>%
  gt() %>%
  tab_style(
    locations = cells_column_labels(columns = everything()),
    style = list(
      cell_text(weight = "bold")
      )
    )
franchID name start_year end_year length
ANA Garret Anderson 1994 2008 15
ARI Miguel Montero 2006 2014 9
ATL Hank Aaron 1954 1974 21
BAL Brooks Robinson 1955 1977 23
BOS Carl Yastrzemski 1961 1983 23
CHC Cap Anson 1876 1897 22
CHW Red Faber 1914 1933 20
CHW Ted Lyons 1923 1942 20
CIN Dave Concepcion 1970 1988 19
CIN Barry Larkin 1986 2004 19
CLE Mel Harder 1928 1947 20
COL Todd Helton 1997 2013 17
DET Ty Cobb 1905 1926 22
DET Al Kaline 1953 1974 22
FLA Luis Castillo 1996 2005 10
HOU Craig Biggio 1988 2007 20
KCR George Brett 1973 1993 21
LAD Bill Russell 1969 1986 18
LAD Zack Wheat 1909 1926 18
MIL Robin Yount 1974 1993 20
MIN Walter Johnson 1907 1927 21
MIN Harmon Killebrew 1954 1974 21
NYM Ed Kranepool 1962 1979 18
NYY Derek Jeter 1995 2014 20
OAK Jimmy Dykes 1918 1932 15
PHI Mike Schmidt 1972 1989 18
PIT Willie Stargell 1962 1982 21
SDP Tony Gwynn 1982 2001 20
SEA Edgar Martinez 1987 2004 18
SFG Mel Ott 1926 1947 22
STL Jesse Haines 1920 1937 18
STL Yadier Molina 2004 2021 18
STL Stan Musial 1946 1963 18
TBD Evan Longoria 2008 2017 10
TEX Michael Young 2000 2012 13
TOR Dave Stieb 1979 1992 14
WSN Ryan Zimmerman 2005 2019 15