Hot Starts and Expectations – Point Estimates

The MLB season is underway, and so far my Tigers are undefeated, having swept the White Sox 3-0. One can reasonably ask what this does to expectations for the season. To answer that, we first have to work out what our initial expectations are. We’ll skip a lot of explanation and say that we think they’re about a .500, or 81-81, team. We also think that while pacing can be helpful, it’s ludicrous to expect them to go undefeated all season, so 162-0 isn’t likely. But how much should these 3 games impact our expectations?

We can turn to Bayesian shrinkage for help. We take a certain number of wins and add it to the actual total, then a certain number of games and add that, and use that as our number. We could add 50 wins and 50 losses and get to (3 + 50) / (3 + 50 + 50) = 53 / 103 = 0.515. (Read David Robinson’s post on Bayes Estimation for a much more in-depth treatment.)

Rather than guess how many games we need to add, we’ll use team totals from 2001 - 2023 to derive our number.

Show the code

library(gt)
library(gtExtras)
library(hrbrthemes)
library(Lahman)
library(MASS)
library(tidyverse)

team_data <- Lahman::Teams

team_data_filtered <- team_data %>%
  filter(yearID >= 2001) %>%
  dplyr::select(yearID, teamID, W, L) %>%
  mutate(wperc = W / (W + L))

m <- MASS::fitdistr(team_data_filtered$wperc, dbeta,
                    start = list(shape1 = 1, shape2 = 10))

alpha0 <- m$estimate[1]
beta0 <- m$estimate[2]

m[1]

$estimate
  shape1   shape2 
21.41202 21.41961

This is good - we get basically 21 wins (shape1) and 21 losses (shape2), so the model uses .500 as a baseline. This is the average win % of all teams, the fact that we’re banking on the Tigers being a roughly .500 team this year is just a happy coincidence. So let’s take a look at how we’d change our mind after 0, 1, 2, or 3 wins, respectively.

*And it comes to 42 games, making 42 the answer to life, the universe, and Bayesian Shrinkage.

Show the code

priors <- tibble(wins = c(0, 1, 2, 3)) %>%
  mutate(games_played = 3,
         win_perc = wins / games_played,
         games_played_shrunk = alpha0 + beta0 + games_played,
         wins_shrunk = wins + alpha0,
         win_perc_shrunk = wins_shrunk / games_played_shrunk,
         ci_low = qbeta(.025, wins_shrunk, games_played_shrunk - wins_shrunk),
         ci_high = qbeta(.975, wins_shrunk, games_played_shrunk - wins_shrunk))


priors %>%
  select(wins, win_perc, win_perc_shrunk, ci_low, ci_high) %>%
  gt() %>%
  fmt_number(columns = c(win_perc, win_perc_shrunk, ci_low, ci_high),
             decimals = 3) %>%
  cols_label(wins = 'Wins',
             win_perc = 'Win %',
             win_perc_shrunk = 'Win % Shrunk',
             ci_low = 'CI Low',
             ci_high = 'CI High') %>%
  gt_theme_espn()

Wins	Win %	Win % Shrunk	CI Low	CI High
0	0.000	0.467	0.326	0.611
1	0.333	0.489	0.347	0.632
2	0.667	0.511	0.368	0.653
3	1.000	0.533	0.389	0.673

So with 3 wins, we’d move our projection up to 0.533, or 86 wins. But notice our confidence interval is still pretty wide - we’d put them at somewhere between 0.389 and 0.673, which overlaps heavily with our interval if they’d won no games. We can see this by charting the expected range after each outcome.

Show the code

priors %>%
  filter(wins %in% c(0, 3)) %>%
  mutate(loss_shrunk = games_played_shrunk - wins_shrunk) %>%
  crossing(x = seq(.1, .9, .00025)) %>%
  mutate(density = dbeta(x, wins_shrunk, loss_shrunk)) %>%
  ggplot(aes(x, density, color = as.factor(wins))) +
  geom_line() +
  labs(x = "Win %", color = "")+
  theme_ipsum()+
  scale_color_manual(values = c('grey', '#FA4616'))

So yes, 3 wins is enough to make us slightly more optimistic, but not so much that we change our initial view all that much. After all, they could lose the next three and become a .500 team by the end of the next series. If they’d lost all three, we’d be a bit more pessimistic but wouldn’t drift that far from .500, either.

As a quick sidebar, tt’s important to note that a model like this is very simplistic. It assumes that the team that’s playing now will be the same as the one playing all season. Not to get all Ship of Theseus about it, but injuries, call-ups, player slumps and hot streaks, etc. will all impact the team so who we’re measuring now won’t be exactly the same as who they are down the road. This is the best any model can do, but we should acknowledge our limits.

Still, this is where a Bayesian approach can be helpful. We can use our expectations (.500 was convenient for this quick analysis, but we could set it to .400, .600, or whatever we think is appropriate) and adjust them with new information. As the season goes on, real games will weigh the output more and more as we move further away from small sample size land.