library(fable)
library(ggplot2movies)
library(hrbrthemes)
library(tidyverse)
set.seed(2023)
<- movies %>%
movies_prepped as_tibble() %>%
mutate(genre = case_when(Action == 1 ~ 'Action',
== 1 ~ 'Animation',
Animation == 1 ~ 'Comedy',
Comedy == 1 ~ 'Drama',
Drama == 1 ~ 'Documentary',
Documentary == 1 ~ 'Romance',
Romance == 1 ~ 'Short',
Short TRUE ~ 'Other')) %>%
filter(genre %in% c('Documentary', 'Short') == FALSE) %>%
group_by(year, genre) %>%
summarise(n = n(),
mean_length = mean(length, na.rm = TRUE),
mean_budget = mean(budget, na.rm = TRUE),
mean_rating = mean(rating, na.rm = TRUE),
mean_votes = mean(votes, na.rm = TRUE)
%>%
) filter(year >= 1960)
<- movies_prepped %>%
movies_tsibble as_tsibble(index = year,
key = c(genre))
<- movies_tsibble %>%
movie_fcasts_models model(arima = ARIMA(mean_budget),
ets = ETS(mean_budget)
)
<- accuracy(movie_fcasts_models)
fcast_accuracy
%>%
movie_fcasts_models forecast(h = "5 years") %>%
autoplot(movies_tsibble)+
facet_wrap(genre ~ ., scales = 'free')+
theme_ipsum()
Forecasting is a fairly common request when working with data. Typically, a single macro-level forecast is not enough, as we also want to see the expected contributions from the entities that make up that forecast. This may be a series of regions, stores, product categories, or even all of the above.
Thankfully, the Fable R package makes creating grouped forecasts super easy. Just put your time-series data into a tsibble, create your models, then forecast away. We’ll use IMDB movie data from the ggplots2movie package for this example. The data comes by the movie, so we’ll first aggregate by year and genre. We’ll then create forecasts for budget for each genre, though we don’t care about documentaries and short films* so we’ll exclude them.
OK, we we got some forecasts. Are they good? Not really. Annual forecasting isn’t great, but since accuracy isn’t the point of this exercise, we’ll be happy with having anything. Now, if we wanted to forecast not just budget, but number of movies, average length, average rating, and average votes, we could just create a bunch of new forecast objects for each. But then we’d have to edit each one if we wanted to make changes, such as adding a square root transformation.
We could also make a function, but we’d still have to loop through it and create a bunch of objects. Thankfully, dplyr’s pivot longerfunction can help us out. Since Fable creates a forecast for each group (above, each genre got its own independent forecast), we just have to move all of our numeric data to a single column, add the column names as a new variable and grouping, then run the models.
<- movies_prepped %>%
movies_tsibble_long pivot_longer(cols = n:mean_votes,
names_to = 'metric',
values_to = 'value') %>%
as_tsibble(index = year,
key = c(genre, metric))
<- movies_tsibble_long %>%
movie_fcasts_models_long model(arima = ARIMA(value),
ets = ETS(value)
)
<- accuracy(movie_fcasts_models_long)
fcast_accuracy_long
%>%
movie_fcasts_models_long forecast(h = "5 years") %>%
autoplot(movies_tsibble_long)+
facet_grid(metric ~ genre, scales = 'free')+
theme_ipsum()
And hey presto, we created a bunch of forecasts with relative ease. In a real-world environment, we’d throw in a few more models (Prophet, Neural Net, etc.) along with some transformations to improve the quality of our forecasts. We’d also have somebody looking at them to check how reasonable they are, as sometimes a model performs well in testing but has strange output.
The accuracy objects we’ve been outputting can be used to find the model with the lowest MASE for each genre/metric, so we can then pick the best automatically. You could also average the models out, which should also give a sensible output. (Read Forecasting: Principles and Practice by Rob J Hyndman and George Athanasopoulos for much more detail on forecasting.) There’s enough going on with forecasting as it is, so the simpler you can make the production the more time you’ll have for sanity checks later.