pacman::p_load(prophet, tidyverse, plotly)In this post, we learn how to forecast using Prophet by following the code in the documentation. Prophet is a procedure implemented in R or Python for forecasting time series data.
I first heard about Prophet in the modules to review for Meta’s Marketing Science Professional– specifically in the chapters covering Marketing Mix Modeling or MMM. Meta had developed and released an open-sourced package for MMM in R called Robyn. This package uses Prophet to decompose the time series data into its different components like the trend, seasonality and holiday patterns.
We will cover implementation in R in this post, but the documentation in the website also gives the same in Python. This post covers up to the section Seasonality, Holiday Effects, And Regressors in the documentation. The rest of the documentation will be covered in a separate post.
Loading R Packages
We load the following packages into our environment using p_load() of the pacman package. We use pacman so that any packages that are not installed will be automatically installed. If all packages are already installed, then the code block is equivalent to a call of library() to load the packages.
The loaded packages include:
prophet - package for importing, managing and processing vector-based geospatial data
tidyverse - collection of packages for performing data importation, wrangling and visualization
plotly - package for the creation of interactive data visualizations
Quick Start
This section covers the procedures and code from the same section in the Prophet documentation.
Loading the data
For this section, we will be using the log of the daily page views for the Wikipedia page of Peyton Manning. This was scraped by the team using the Wikipediatrend package in R.
df <- read.csv('https://raw.githubusercontent.com/facebook/prophet/main/examples/example_wp_log_peyton_manning.csv')head(df) ds y
1 2007-12-10 9.590761
2 2007-12-11 8.519590
3 2007-12-12 8.183677
4 2007-12-13 8.072467
5 2007-12-14 7.893572
6 2007-12-15 7.783641
df is already in standard input for Prophet as it contains two columns: ds and y. ds corresponds to the datestamp. It should be comprised of strings in the format YYYY-MM-DD for dates, or YYYY-MM-DD HH:MM:SS for times. y contains the numeric value that we want to predict or analyze.
Fitting the Model
The function prophet() is used to fit the model. The only required parameter is the historical data df, but other parameters are available to control how the package fits the data.
m <- prophet(df)Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
Making Predictions
Making predictions requires a dataframe that contains a column ds with the dates that we want to make predictions on. The make_future_dataframe() function can generate such a dataframe, by taking in the model object and the number of periods in the future that we want to make predictions on. Note that the resulting object, by default, will include all values of ds from the model object.
future <- make_future_dataframe(m, periods = 365)
tail(future) ds
3265 2017-01-14
3266 2017-01-15
3267 2017-01-16
3268 2017-01-17
3269 2017-01-18
3270 2017-01-19
We then use the predict() function to produce the forecast. The result will be an object with multiple columns where the forecast is given by:
yhat- the central forecastyhat_lower- lower bound of the forecastyhat_upper- the upper bound of the forecast
forecast <- predict(m, future)
tail(forecast) ds trend additive_terms additive_terms_lower
3265 2017-01-14 7.191859 0.6354309 0.6354309
3266 2017-01-15 7.190836 1.0182769 1.0182769
3267 2017-01-16 7.189813 1.3442933 1.3442933
3268 2017-01-17 7.188789 1.1327369 1.1327369
3269 2017-01-18 7.187766 0.9664145 0.9664145
3270 2017-01-19 7.186742 0.9793367 0.9793367
additive_terms_upper weekly weekly_lower weekly_upper yearly
3265 0.6354309 -0.31172085 -0.31172085 -0.31172085 0.9471518
3266 1.0182769 0.04830820 0.04830820 0.04830820 0.9699688
3267 1.3442933 0.35228589 0.35228589 0.35228589 0.9920074
3268 1.1327369 0.11963148 0.11963148 0.11963148 1.0131054
3269 0.9664145 -0.06664470 -0.06664470 -0.06664470 1.0330592
3270 0.9793367 -0.07228677 -0.07228677 -0.07228677 1.0516235
yearly_lower yearly_upper multiplicative_terms multiplicative_terms_lower
3265 0.9471518 0.9471518 0 0
3266 0.9699688 0.9699688 0 0
3267 0.9920074 0.9920074 0 0
3268 1.0131054 1.0131054 0 0
3269 1.0330592 1.0330592 0 0
3270 1.0516235 1.0516235 0 0
multiplicative_terms_upper yhat_lower yhat_upper trend_lower trend_upper
3265 0 7.059228 8.542823 6.853371 7.559537
3266 0 7.511589 8.904177 6.850234 7.560244
3267 0 7.828382 9.315243 6.847097 7.560952
3268 0 7.606361 9.124440 6.844180 7.561659
3269 0 7.373763 8.924233 6.841665 7.562367
3270 0 7.438956 8.936140 6.839150 7.563074
yhat
3265 7.827290
3266 8.209113
3267 8.534106
3268 8.321526
3269 8.154180
3270 8.166079
Visualizing the Forecast
We can call the standard plot() function to plot the actual y values and the yhat values by passing the model and the forecast objects.
plot(m, forecast)
The function prophet_plot_components() visualizes a breakdown of the trends and the different seasonality components.
prophet_plot_components(m, forecast)
An interactive version of the plot can be produced using Dygraphs using dyplot.prophet().
dyplot.prophet(m, forecast)Warning: `select_()` was deprecated in dplyr 0.7.0.
ℹ Please use `select()` instead.
ℹ The deprecated feature was likely used in the prophet package.
Please report the issue at <https://github.com/facebook/prophet/issues>.
Saturating Forecasts
When forecasting, there is usually a maximum value (e.g., demand, sales) that can be achieved. This point is called the carrying capacity and may be dictated by factors like market size or population. Prophet allows us to make forecasts using a logistics growth trend (vs the default a linear trend) with a specified carrying capacity.
Specifying the Carrying Capacity or Maximum Limit
The carrying capacity is specified by including a column cap in the input dataframe. In the following, we set it to 8.5. (i.e., the maximum log number of views)
df$cap <- 8.5Note that this means that there is a cap defined for every row in the dataframe. The value also does not need to be constant, so it can be an increasing or decreasing sequence depending on the case.
Fitting the Model
We can still use the function prophet() to fit the model, but we specify the use of a logistic growth model using the growth argument.
m <- prophet(df, growth = 'logistic')Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
Making Predictions
We again create the dataframe for the predictions using make_future_dataframes(), but we now also need to add an additional column cap to specify the carrying capacity for all periods.
future <- make_future_dataframe(m, periods = 1826)
future$cap <- 8.5We again pass the model object and the future periods dataframe as inputs to predict() to generate the predictions.
fcst <- predict(m, future)
plot(m, fcst)
Specifying a Minimum Value
While we have set a maximum value using cap, the logistic function also observes an implicit minimum of zero, but it is possible to set a different minimum value for the forecast by specifying a floor.
df$floor <- 5.0
m <- prophet(df, growth = 'logistic')Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
future$cap <- 8.5
future$floor <- 5.0
fcst <- predict(m, future)
plot(m, fcst)
Trend Changepoints
By default, Prophet will automatically detect trend changepoints and will allow the trend to adapt as needed. If we wish to have finer control over this, then there are a number of arguments we can use.
Automatic Changepoint Detection
Prophet detects changepoints by first specifying a large number of points where trend can change, but will use a few of them as possible in the fitted model. By default, there will be 25 potential points placed uniformly placed in the first 80% of the time series, but there are some arguments that can be used to modify this behavior.
We can visualize the changepoints used by Prophet in the fitted model by using add_changes_to_plot() with the plot() function.
plot(m, forecast) + add_changepoints_to_plot(m)
Adjusting Trend Flexibility
Prophet uses sparse priors on the magnitude of the possible trend changepoints, and its magnitude can be controlled by the parameter changepoint_prior_scale. The default value is 0.05. Increasing the value of this parameter makes it more flexible– which means more of the potential trend changepoints may be used, and there is a higher risk of overfitting. A lower value of the parameter will result to the inverse, i.e., less points used and there is a higher risk of underfitting.
The code below shows the impact of increasing the parameter value from the default 0.05 to 0.5.
m <- prophet(df, changepoint.prior.scale = 0.5)Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
forecast <- predict(m, future)
plot(m, forecast) + add_changepoints_to_plot(m)
The number of changepoints used has increased significantly from 8 to more than 20.
Specifying the Location of Changepoints
If we wish to override the automatic changepoint detection used by Prophet, we can manually specify the location of potential changepoints using the changepoints argument.
The following code just specifies a single location for the potential changepoint, which is then reflected as the single inflection point in the resulting plot.
m <- prophet(df, changepoints = c('2014-01-01'))Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
forecast <- predict(m, future)
plot(m, forecast) + add_changepoints_to_plot(m)
Seasonality, Holiday Effects, And Regressors
Modeling Holidays and Special Events
Holidays and other recurring events can be added to the model by creating a dataframe for them. It requires two columns: holiday for the name or tag of the holiday, and ds for the datestamp. A set of optional columns are lower_window and upper_window, which dictate how many days before and after the holiday should be considered. For example, without considering Christmas Eve separately, a lower_window value of -1 can be set for Christmas. (and 0 for upper_window)
The following code creates a dataframe for the dates of Peyton Manning’s playoff appearances. Note that there are two types of holidays defined: playoff and superbowl. Superbowls are a subset of playoffs so we can consider that there is an increased holiday effect on the same dates.
playoffs <- tibble(
holiday = 'playoff',
ds = as.Date(c('2008-01-13', '2009-01-03', '2010-01-16',
'2010-01-24', '2010-02-07', '2011-01-08',
'2013-01-12', '2014-01-12', '2014-01-19',
'2014-02-02', '2015-01-11', '2016-01-17',
'2016-01-24', '2016-02-07')),
lower_window = 0,
upper_window = 1
)
superbowls <- tibble(
holiday = 'superbowl',
ds = as.Date(c('2010-02-07', '2014-02-02', '2016-02-07')),
lower_window = 0,
upper_window = 1
)
holidays <- bind_rows(playoffs, superbowls)We then rerun the model and pass this dataframe into the holidays parameter.
m <- prophet(df, holidays = holidays)Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
forecast <- predict(m, future)The holiday components now show up in columns of the forecast object.
forecast %>%
select(ds, playoff, superbowl) %>%
filter(abs(playoff + superbowl) > 0) %>%
tail(10) ds playoff superbowl
17 2014-02-02 1.224282 1.200207
18 2014-02-03 1.900822 1.453924
19 2015-01-11 1.224282 0.000000
20 2015-01-12 1.900822 0.000000
21 2016-01-17 1.224282 0.000000
22 2016-01-18 1.900822 0.000000
23 2016-01-24 1.224282 0.000000
24 2016-01-25 1.900822 0.000000
25 2016-02-07 1.224282 1.200207
26 2016-02-08 1.900822 1.453924
The total holiday effect also shows up in the component plot.
prophet_plot_components(m, forecast)
Built-in Country Holidays
Prophet can also use a built-in collection of holidays using the add_country_holidays() method. The following code adds the built-in US holidays in addition to the ones we defined earlier:
m <- prophet(holidays = holidays)
m <- add_country_holidays(m, country_name = 'US')
m <- fit.prophet(m, df)Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
As of writing, holiday dates are computed in R for 1995 through 2044.
We can check which holidays were included by checking the train.holiday.names attribute of the model:
m$train.holiday.names [1] "playoff" "superbowl"
[3] "New Year's Day" "Martin Luther King Jr. Day"
[5] "Washington's Birthday" "Memorial Day"
[7] "Independence Day" "Labor Day"
[9] "Columbus Day" "Veterans Day"
[11] "Veterans Day (Observed)" "Thanksgiving"
[13] "Christmas Day" "Independence Day (Observed)"
[15] "Christmas Day (Observed)" "New Year's Day (Observed)"
The holiday effects will be reflected in the forecast object and the components plot.
forecast <- predict(m, future)
prophet_plot_components(m, forecast)
Fourier Order for Seasonalities
Seasonality is estimated using a partial Fourier sum. The number of terms in the partial sum, or the Fourier order, determines how quickly seasonality can change; for Prophet the default is 10. The following code plots the yearly seasonality using the default parameters.
m <- prophet(df)Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
prophet:::plot_yearly(m)
The order can be adjusted using the parameter yearly.seasonality. A higher value increases the frequency of changes and will generally result to a less smooth plot. The code below adjusts the order to 20:
m <- prophet(df, yearly.seasonality = 20)Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
prophet:::plot_yearly(m)
Specifying Custom Seasonalities
By default, Prophet will fit weekly and yearly seasonalities. Other seasonalities can be added by using add_seasonality(). The inputs to this function is the name of the seasonality, the period in days, and the fourier.order to use. Note that the yearly and weekly seasonality use a default value of 10 and 3 for the Fourier order, respectively.
The code below replaces the weekly seasonality (by toggling weekly.seasonality) with a monthly seasonality:
m <- prophet(weekly.seasonality=FALSE)
m <- add_seasonality(m, name='monthly', period=30.5, fourier.order=5)
m <- fit.prophet(m, df)Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
forecast <- predict(m, future)
prophet_plot_components(m, forecast)
Seasonalities that Depend on Other Factors
In certain cases, the seasonality might differ at different times: e.g., during summer months vs the rest of the year, during weekends vs weekdays. These can be modeled in Prophet using conditional seasonalities.
In our current dataset, the default seasonality assumes that the pattern is the same throughout the year. If we want to differentiate the seasonality on-season and off-season, we can use conditional seasonalities.
The first step is to include boolean columns in df to indicate whether the record is on-season or off-season:
is_nfl_season <- function(ds) {
dates <- as.Date(ds)
month <- as.numeric(format(dates, '%m'))
return(month > 8 | month < 2)
}
df$on_season <- is_nfl_season(df$ds)
df$off_season <- !is_nfl_season(df$ds)To override the default seasonality, we again disable the default weekly seasonality.
m <- prophet(weekly.seasonality=FALSE)We then replace it by adding the two custom seasonalities. These are made conditional through the use of the condition.name parameter.
m <- add_seasonality(m, name='weekly_on_season', period=7, fourier.order=3, condition.name='on_season')
m <- add_seasonality(m, name='weekly_off_season', period=7, fourier.order=3, condition.name='off_season')
m <- fit.prophet(m, df)Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
The future dates should also include the same boolean columns before generating the predictions:
future$on_season <- is_nfl_season(future$ds)
future$off_season <- !is_nfl_season(future$ds)
forecast <- predict(m, future)
prophet_plot_components(m, forecast)
Prior Scale for Holidays and Seasonality
If holidays or seasonality appear to be overfitting, then they can be adjusted using the holiday.prior.scale, seasonality.prior.scale, or the prior.scale parameter.
The code below reduces the holiday prior scale from the default 10 to 0.5. A lower value dampens the effect of holidays.
m <- prophet(df, holidays = holidays, holidays.prior.scale = 0.05)Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
forecast <- predict(m, future)
forecast %>%
select(ds, playoff, superbowl) %>%
filter(abs(playoff + superbowl) > 0) %>%
tail(10) ds playoff superbowl
17 2014-02-02 1.204873 0.9695985
18 2014-02-03 1.851387 1.0025073
19 2015-01-11 1.204873 0.0000000
20 2015-01-12 1.851387 0.0000000
21 2016-01-17 1.204873 0.0000000
22 2016-01-18 1.851387 0.0000000
23 2016-01-24 1.204873 0.0000000
24 2016-01-25 1.851387 0.0000000
25 2016-02-07 1.204873 0.9695985
26 2016-02-08 1.851387 1.0025073
For seasonality effects, they can be adjust in total by using seasonality.prior.scale parameter, or individually by using the the prior.scale parameter with add_seasonality().
Additional Regressors
Additional regressors or independent variables can be added to the model using add_regressor(). A column with the new variable’s value should be added in both the input dataframe df and the prediction dataframe future.
The code below adds an additional column nfl_sunday to identify the effect of Sundays during the NFL season. The effect of this (and any other regressor) will show up in the extra_regressors plot. Note that the same regressor column should be present in the prediction dataframe.
nfl_sunday <- function(ds) {
dates <- as.Date(ds)
month <- as.numeric(format(dates, '%m'))
as.numeric((weekdays(dates) == "Sunday") & (month > 8 | month < 2))
}
df$nfl_sunday <- nfl_sunday(df$ds)
m <- prophet()
m <- add_regressor(m, 'nfl_sunday')
m <- fit.prophet(m, df)Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
future$nfl_sunday <- nfl_sunday(future$ds)
forecast <- predict(m, future)
prophet_plot_components(m, forecast)
While the new column was binary, regressors do not have to be. The add_regressor() function can accept optional arguments like adjusting the prior scale like the other effects. The full list of parameters should be available in the help documentation. (?add_regressor)
The coefficients of the regressors can be extracted using prophet::regressor_coefficients.
End Notes
As the post is already long, I will stop here and cover the rest of the documentation in another post. So far, it appears that Prophet is a powerful forecasting package which offers a good amount of flexibility in terms of adjusting the parameters for trend and seasonality, but also to take into account other regressors.