We are happy to present JASP’s first procedure for time series analysis! Version 0.15 includes the Prophet module which contains the homonymous analysis developed by Facebook’s Taylor and Letham (2018). Its core feature is a model that allows flexible time series forecasting on different scales. You want time series visualization? Changepoint estimation? Does your time series data have strong seasonalities? The new Prophet module can account for all of this! In this blog post, we will give a quick introduction to the Prophet model and illustrate the corresponding JASP module with a simple example. For readers that want to perform more complex analyses using the Prophet module in JASP, we refer to Facebook’s GitHub page.
The Prophet Model
Let’s start with the Prophet model itself: It is based on a generalized additive model, that is, it consists of nonlinear terms that are added together. Prophet has three different nonlinear terms: A trend, seasonalities, and holidays. In the JASP module, only the trend and seasonalities are currently available. Additionally, there is also a normally distributed error term. All terms are depending on time.
Two different trends can be modeled with Prophet. First, the trend in the time series can be modeled as a linear function of time. Similar to a linear regression, the time series variable is predicted by the time variable. In Prophet, this is called linear growth. Second, there is logistic growth: Here, it is assumed that the dependent variable has an upper limit called carrying capacity that it approaches asymptotically but cannot outgrow. For instance, the spreading of a disease is limited by the size of the population, thus, there cannot be more cases of a disease than individuals in the population. As in regression models, the trend is described by two parameters: A growth rate k and an offset (or intercept) m.
Major context events often abruptly change the trends observed in time series data. For example, a heavy storm can substantially change the number of planes that depart from an airport for a few days. Prophet can account for these sudden trend changes by estimating or setting changepoints. Instead of assuming the same trend for the entire time series, it estimates trends for pieces of the data separated by changepoints. These changepoints are added to the trend parameter k and denoted by the parameter vector δ.
Prophet is very flexible in handling seasonal effects in time series data. Seasonalities in Prophet are modeled by Fourier series. These have a period and an order that determines the flexibility of the seasonality. For example, a weekly seasonality with period seven and order three could have three changes within seven days. Multiple seasonality terms with differing periods and orders can be included in Prophet. Seasonalities can either be additive or multiplicative.
Parameters in Prophet are either estimated using Maximum-a-posteriori (MAP) or Markov chain Monte Carlo (MCMC). Because both methods are Bayesian, priors can be specified for all parameters. In our experience, the default priors are sufficient for many applications, but more information on how to adjust them can be found in the online reference.
The JASP Page Visits Example Data Set
Now, to the Prophet module in JASP. We will illustrate the basic functionalities of the module on the JASP Page Visits data set which is also included in the JASP Data Library. The data set consists of four variables: A ‘date’ variable that indicates the day of the recording starting at January 6 and ending at October 8. This will be our time variable. Then, there is the number of visits to the JASP web page which we will try to predict, that is, our dependent variable.
Activating the Prophet Module
The Prophet module is not activated by default, so we first need to do that by selecting the ‘+’ in the top right corner of JASP and tick the box for Prophet. Now a new icon with a crystal ball should appear in the module section. We click on the icon and the Prophet analysis pops up. Now, we insert the variable ‘visits’ in the box ‘Dependent Variable’ and ‘date’ into ‘Time’. This should let JASP estimate the model and output its parameters.
Visualizing Time Series Data
Before we consider the model parameters, it is always better to visualize the data we are trying to model. We can do this by ticking ‘History plot’ and selecting ‘Both’ (meaning points and a connecting line). In the output, a new plot appears that shows the visits to the JASP web page from January to October connected with a line.
By visually inspecting the plot, we can already see that the visits are overall increasing over time, signaling a positive trend. We can also observe two changes in the trend, approximately in May and August. Finally, the regular fluctuation of the line seems to hint to a seasonal pattern in the number of visits.
Specifying the Model
Setting up a Prophet model in JASP correctly can be challenging. Fortunately, we only need a few components in our basic model for this data set. Because we don’t have any information on the carrying capacity of the visits (potentially the number of people with computer access), we leave the default ‘Linear trend’ under the variable window as it is.
The first important component we need to specify are changepoints. Under the ‘Model’ section, we find the group ‘Automatic Changepoints’. Here, we can set the maximum number of changepoints that the model will estimate (‘Max. changepoints’). Based on our visual data inspection, we enter two as the maximum. The ‘Changepoint range’ determines in which part of the data the changepoint can be estimated. We leave this at the default of 0.8, meaning that changepoints can be estimated in the first 80% of the data. The ‘Laplace prior τ’ defines the scale parameter of the Laplace prior distribution that is used to estimate the change in the trend at each changepoint.
In the next group ‘Estimation’, we use the default ‘Markov chain Monte Carlo’ for MCMC estimation. This method can be much slower than MAP but enables uncertainty estimates for parameters and predictions (e.g., credible intervals). We set ‘Samples’ to 2000 to ensure that all parameters are estimated as reliably and precisely as possible.
Next, we add a seasonality term to our model. Under the group ‘Seasonalities’, we can add them by clicking on the green ‘+’ button. We want to specify a weekly seasonality, so we enter the name ‘Weekly’, choose a ‘Period’ of seven with the ‘unit‘ days. The ‘Normal prior σ2’ indicates the variance of the normal prior distribution that is used to estimate a vector of parameters that is multiplied with the Fourier series (the number of parameters is equal to the order of the series). We take the default and set the ‘Fourier order’ to three as recommended for weekly seasonalities by Taylor and Letham (2018). This allows for three changes (i.e., local minima or maxima) in our seasonality. Lastly, we set the ‘Mode’ to multiplicative to let the seasonality interact with our trend. This means that the seasonality can be interpreted as seasonal changes in the trend. And then, our model is complete.
What to Predict and not to Predict?
Now that our model is specified, we need to decide what we want to predict. Since our time series is rather small and we only have time as a predictor, we should not attempt to make long-term predictions. Moreover, the data we have is measured on a daily level. Thus, the model will make very poor predictions on a more fine-grained level (e.g., hours) because it does not have any information about this level. The Prophet module offers two different kinds of predictions: They can be periodical, that is, for periods of a certain unit after the time series ends (including the time series itself). Another option is nonperiodical prediction for a custom time interval. Taking these factors into account, we simply choose ‘Periodical’ under the ‘Prediction’ section and enter 14 as the ‘Number of periods’. The model will now make a forecast for visits of the next 14 days after the data set ends.
How Good are Prophets Forecasts?
As noted earlier, Prophet is a very flexible model, and is as such also prone to overfitting. By setting many changepoints and including flexible seasonalities, we might perfectly “predict” the data that the model was trained on. However, following the bias-variance trade-off, such a model will make very poor predictions for future data points not included in the training. To avoid this pitfall, we need to evaluate the out-of-sample prediction of our model.
For this purpose, Taylor and Letham (2018) created the Simulated Historical Forecasts procedure that uses cross-validation. The method cuts the time series into a number of overlapping pieces, starting from the end and stopping at an interval that is always used for training the model. It then trains the Prophet model on the initial training sample and predicts the following data points that lie within a time interval called horizon (i.e., the test sample). The end of the training sample moves forward for a given period of time. The model is trained on the new training sample and makes predictions for the next horizon. The training sample moves forward in time and the model is retrained until out-of-sample predictions for every data point were made. Finally, an accuracy measure for the predictions is calculated and averaged over the horizons.
To evaluate our model, we tick ‘Simulated historical forecasts’ under the ‘Evaluation’ section. Here, it is also important to select a unit that matches the unit the data was measured on or that is more coarse. We select days as ‘Unit’, enter a ‘Horizon’ of 14, which automatically sets the ‘Period between cutoffs’ to half of that number and ‘Initial training time’ to three times that number. We also tick ‘Performance metrics’ to see the evaluation output, and ‘Mean absolute percentage error (MAPE)’ to include this accuracy measure which is calculated as:
where yt* is the predicted number of visits at date t. Lastly, we tick ‘Changepoint table’ to see the changepoint estimates in the output.
Interpreting the Prophet Output
We start interpreting the output of the JASP Prophet module by looking at the Posterior Summary Table. As the title suggests, it summarizes the posterior distributions of three parameters. The main parameter of interest is the growth rate, which is positive and has a narrow credible interval. This indicates that the estimated trend is mostly likely positive, meaning that the visits are increasing in the first piece of the linear trend model. The positive offset suggests that the number of visits is larger than zero at the start of the time series. In the right three columns, we can see diagnostics of the MCMC estimation. The R-hat values are close to 1 so the chains have likely converged and seem to be well-mixed. The effective sample sizes (ESS) in the main part (bulk) and tails of the posterior distribution are also large, meaning that our summary statistics are reliable and precise.
The next table contains the summary of the posterior distributions for the two changepoints we included in our model. The first changepoint occurs in the end of April and is negative, leading to a decreasing trend for the second piece of the linear trend model:
Conversely, the second changepoint is strongly positive, reversing the trend to be increasing for the last piece. Notably, the credible intervals are narrow for both changepoints, that is, Prophet is quite certain about the estimated changes in the trend. Interestingly, the dates of the changepoint roughly match the start and end of the academic summer break. Intuitively, we can assume that JASP is downloaded less frequently during that time because no statistics classes are taught.
To evaluate the out-of-sample predictive accuracy of our model, we inspect the Simulated Historical Forecasts Table. Here, we see the MAPE for each day in our specified horizon. The accuracy seems to decrease the further the predictions lie in the future. Predictions for 14 days beyond the training sample deviate from the true values by approximately 20%. Thus, the forecasts by our model are somewhat accurate but can definitely be improved.
The JASP Prophet module offers many plots to display the output of our model. To visualize the model predictions, we tick ‘Overall’ under the ‘Forecasts plots’ group in the ‘Plots’ section. To compare the predictions with the data and see the changepoints, we also tick ‘Show data points’ and ‘Show changepoints’.
In the resulting plot, we can see that the predictions overall seem to fit the data quite well, with two exceptions around April and September. The blue-shade area is the prediction interval. For predictions beyond the training data set (after the dashed vertical line), the prediction interval remains narrow, suggesting that the visits will increase in the near future. Moreover, the weekly seasonality appropriately captures the fluctuations in the time series. Note that the seasonal prediction fluctuates more when the number of visits increase. This occurs because we specified a multiplicative seasonality term.
We can plot the seasonal effect by dragging our ‘Weekly’ seasonality term from ‘Seasonalities’ into the ‘Seasonality Plots’ window. The resulting plot shows that the number of visits to the JASP page increases in the beginning of the week to about 25% of the average trend and decreases towards the weekend to about -40%. This can be easily explained because students and university employees usually do not work as much on weekends. Again the prediction interval in the plot is narrow, meaning that Prophet is quite certain about the seasonal predictions.
Using the JASP Prophet module, we constructed a simple model to predict the number of visits to the JASP web page, an example time series data set. We first visualized the data to make sensible specifications for a Prophet model, including changepoints and a weekly seasonality effect. We interpreted the model parameters and assessed the out-of-sample prediction of the model to avoid overfitting the data. Finally, we plotted the model predictions and seasonal effect, concluding that our model makes reasonable and interpretable predictions for the number of visits to the JASP page.
The JASP Prophet module offers more features than we explained in the previous example. A few important ones are:
- Plotting posterior distributions: Under the ‘Plots’ section and the ‘Parameter Plots’ group, the posterior marginal distributions for the model parameters can be plotted.
- Logistic growth using a carrying capacity and saturating minimum: Under the variable window set ‘Growth’ to ‘Logistic’. This trend model requires a carrying capacity either supplied as a variable or as a constant value. These can also be shown in the forecast plots.
- Covariates: Prophet can also include covariate to make more accurate forecasts. However, values of the covariates need to be available for the predicted values. Under the ‘Model’ section, covariates can be specified. They can also be plotted (see ‘Plots’ and ‘Covariate Plots’).
- Saving forecasts: The model predictions can be saved as a .csv-file to be used in other spreadsheet programs or to be shared. Under ‘Prediction’ a saving directory can be chosen with ‘Save predictions’.
The new JASP Prophet module offers a flexible analysis tool for predicting time series data. It can handle trend changes and seasonal effects in the data and allows researchers to easily assess the model’s predictions. It also provides various visualizations of time series data and forecasts. The most beneficial aspect of the module is that its output can be easily interpreted and adjusted based on the applicant’s knowledge as intended by the original creators of Prophet. For more time series analyses in JASP, stay tuned!
Taylor, S. J. & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37-45. https://doi.org/10.1080/00031305.2017.1380080