lmForc

Nelson Rayl

library(lmForc)

Sections

Forecast Class

is_forc

is_forc_general

oos_realized_forc

oos_realized_forc_general

oos_vintage_forc

oos_vintage_forc_general

conditional_forc

conditional_forc_general

oos_lag_forc

historical_average_forc

random_walk_forc

autoreg_forc

performance_weighted_forc

states_weighted_forc


Subsetting Functions Overview

subset_forcs

subset_bytime

subset_identical


Transformation Functions Overview

convert_bytime

transform_bytime

convert_byh

transform_byh


Forecast Class

At the heart of the lmForc package is the Forecast class. Base R does not provide a good format for working with forecasts, so lmForc addresses this by introducing a new class for storing forecasts that is simple and rigorous. The Forecast class is paramount to the lmForc philosophy of simple syntax and realistic tests. Forecast is an S4 class that contains equal length vectors with the following data:

The Forecast class also includes an additional length-one slot h_ahead for representing how many periods ahead are being forecasted. This slot is optional, but becomes useful for documentation and performing out-of-sample forecast tests.

We demonstrate the Forecast class by constructing a simple Forecast object. This forecast contains four observations at the quarterly frequency.

my_forecast <- Forecast(
   origin   = as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31")),
   future   = as.Date(c("2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31")),
   forecast = c(4.21, 4.27, 5.32, 5.11),
   realized = c(4.40, 4.45, 4.87, 4.77),
   h_ahead  = 4L
)

Note that we have chosen Date objects at the quarterly frequency for our origin and future slots. This forecast is for four quarters ahead, so we fill in the h_ahead slot with the integer four.

The origin, future, and h_ahead slots can be filled with any values. This is where the general nature of the Forecast class shines. In the origin and future slots we could use dates at a daily or yearly frequency, the POSIXct class to include minutes and seconds, or integers to represent discrete periods. We can also store different types of forecasts. In the example above, we have forecasts made at different origin times for a constant four quarter ahead forecast horizon. We could also store a forecast made at a single origin time for a horizon of future times by setting all of the origin values to one time and using future to represent the horizon of times that we are forecasting. In this case the h_ahead slot becomes irrelevant and it is left as NULL. The flexibility of these slots allows us to represent any type of numeric forecast.

The forecast and realized slots take numeric vectors. In the forecast slot we see the forecast that was made at each origin time and in the realized slot we see the true value that was realized at each future time. The realized values may not exist yet, so this slot may be partially populated or not populated at all. If the realized slot is set to NULL then it will be populated with a vector of NA values.

The Forecast class strikes a balance between simplicity and rigor. It is simple enough to store any numeric forecast, but it is rigorous enough to create a useful data structure. For example, we can quickly calculate a number of forecast accuracy metrics for the Forecast object we created above using only one argument.

mse(my_forecast)
#> [1] 0.09665
rmse(my_forecast)
#> [1] 0.3108858
mae(my_forecast)
#> [1] 0.29
mape(my_forecast)
#> [1] 0.06182814
R2(my_forecast)
#> [1] 0.9973145

Because the forecast and realized slots must be numeric vectors, and all slots must be of the same length, we can calculate forecast accuracy metrics without having to worry about input validation or coercing multiple vectors to the correct format. The forecast accuracy metrics available in the lmForc package are calculated as follows where: \[ n = \text{forecast vector length}\\ \quad Y_i = \text{realized values}\\ \hat{Y_i} = \text{forecast values} \]

MSE is calculated as: \[ MSE = \frac{1}{n} \sum_{i=1}^{n}{(Y_i - \hat{Y_i})^2} \]

RMSE is calculated as: \[ RMSE = \sqrt{MSE} \] MAE is calculated as: \[ MAE = \frac{1}{n} \sum_{i=1}^{n}{|\hat{Y_i} - Y_i|} \] MAPE is calculated as: \[ MAPE = \frac{1}{n} \sum_{i=1}^{n}\frac{|Y_i - \hat{Y_i}|}{{Y_i}} \] R2 is calculated as: \[ R^2 = cor(\hat{Y_i}, \ Y_i)^2 \]

Note that these equations require two inputs, but because both inputs are already stored in the Forecast object we only need to pass one argument to the mse() and rmse() functions. Calculating forecast accuracy is a simple use case of the Forecast class. More complex use cases exist in the lmForc package, where many of the functions require inputs to be of the Forecast class. When weighting multiple forecasts or testing a linear model that is conditional on another forecast, the consistent structure of the class results in simple functions that execute correctly, no matter the type of forecast passed to the function. Furthermore, all lmForc functions return objects of the Forecast class which creates synergy between functions. One can take two linear models, test their performance out-of-sample with the oos_realized_forc() function which returns Forecast objects, and then pass these two objects to the performance_weighted_forc() function to find the weighted out-of-sample performance of both models.

One fear may be that the novel Forecast class will not play well with functions and packages that already exist in the R language. The lmForc package provides methods for accessing all of the vectors stored in a Forecast object as well as the forc2df() function which returns one or multiple Forecast objects as a data frame. With these methods, one can easily pass the data in a Forecast object to other functions.

forc2df(my_forecast)
#>       origin     future forecast realized
#> 1 2010-03-31 2011-03-31     4.21     4.40
#> 2 2010-06-30 2011-06-30     4.27     4.45
#> 3 2010-09-30 2011-09-30     5.32     4.87
#> 4 2010-12-31 2011-12-31     5.11     4.77

origin(my_forecast)
#> [1] "2010-03-31" "2010-06-30" "2010-09-30" "2010-12-31"

future(my_forecast)
#> [1] "2011-03-31" "2011-06-30" "2011-09-30" "2011-12-31"

forc(my_forecast)
#> [1] 4.21 4.27 5.32 5.11

realized(my_forecast)
#> [1] 4.40 4.45 4.87 4.77

Example Dataset

Examples throughout the rest of the vignette will use a stylized dataset with a date column of ten quarterly dates, a dependent variable y, and two independent variables x1 and x2. Equations are also written in terms of the variables y, x1, and x2.

date <- as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
                  "2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31", 
                  "2012-03-31", "2012-06-30"))

y  <- c(1.09, 1.71, 1.09, 2.46, 1.78, 1.35, 2.89, 2.11, 2.97, 0.99)

x1 <- c(4.22, 3.86, 4.27, 5.60, 5.11, 4.31, 4.92, 5.80, 6.30, 4.17)

x2  <- c(10.03, 10.49, 10.85, 10.47, 9.09, 10.91, 8.68, 9.91, 7.87, 6.63)

data <- data.frame(date, y, x1, x2)

head(data)
#>         date    y   x1    x2
#> 1 2010-03-31 1.09 4.22 10.03
#> 2 2010-06-30 1.71 3.86 10.49
#> 3 2010-09-30 1.09 4.27 10.85
#> 4 2010-12-31 2.46 5.60 10.47
#> 5 2011-03-31 1.78 5.11  9.09
#> 6 2011-06-30 1.35 4.31 10.91

To demonstrate the _general set of functions which extend standard lmForc functions to accommodate non-linear models, we will use a logit regression and a modified version of the above dataset in which the dependent variable y is binary.

date <- as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
                  "2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31", 
                  "2012-03-31", "2012-06-30", "2012-09-30", "2012-12-31",
                  "2013-03-31", "2013-06-30", "2013-09-30", "2013-12-31"))
y  <- c(1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0)
x1 <- c(8.22, 3.86, 4.27, 3.37, 5.88, 3.34, 2.92, 1.80, 3.30, 7.17, 3.22, 3.86, 4.27, 3.37, 5.88, 3.34)
x2 <- c(4.03, 2.46, 2.04, 2.44, 6.09, 2.91, 1.68, 2.91, 3.87, 1.63, 4.03, 2.46, 2.04, 2.44, 6.09, 2.91)
dataLogit <- data.frame(date, y, x1, x2)

head(dataLogit)
#>         date y   x1   x2
#> 1 2010-03-31 1 8.22 4.03
#> 2 2010-06-30 0 3.86 2.46
#> 3 2010-09-30 0 4.27 2.04
#> 4 2010-12-31 0 3.37 2.44
#> 5 2011-03-31 1 5.88 6.09
#> 6 2011-06-30 1 3.34 2.91

is_forc

The is_forc() function produces an in-sample forecast based on a linear model. The function takes a linear model call and an optional vector of time data associated with the linear model as inputs. The linear model is estimated once over the entire sample period and the coefficients are multiplied by the realized values in each period of the sample. This is functionally identical to the predict() function in the stats package, but it returns a Forecast object instead of a numeric vector.

For all observations i in the sample, coefficients are estimated as:

\[ Y_i = \beta_0 + \beta_1 X1_i + \beta_2 X2_i + \epsilon_i \qquad \text{for all } i \] And forecasts are estimated as:

\[ forecast_i = \beta_0 + \beta_1 X1_i + \beta_2 X2_i \qquad \]

is_forc(
  lm_call  = lm(y ~ x1 + x2, data),
  time_vec = data$date
)
#> h_ahead = 0 
#> 
#>        origin     future forecast realized
#> 1  2010-03-31 2010-03-31 1.394370     1.09
#> 2  2010-06-30 2010-06-30 1.138708     1.71
#> 3  2010-09-30 2010-09-30 1.423339     1.09
#> 4  2010-12-31 2010-12-31 2.358107     2.46
#> 5  2011-03-31 2011-03-31 2.024964     1.78
#> 6  2011-06-30 2011-06-30 1.450924     1.35
#> 7  2011-09-30 2011-09-30 1.894861     2.89
#> 8  2011-12-31 2011-12-31 2.502394     2.11
#> 9  2012-03-31 2012-03-31 2.867846     2.97
#> 10 2012-06-30 2012-06-30 1.384488     0.99

Note that because we are creating an in-sample forecast, h_ahead is set to 0 and the origin time equals the future time. This test evaluates how well a linear forecast model fits the historical data.

is_forc_general

The is_forc_general() function produces an in-sample forecast based on a model function and prediction function specified by the user. The is_forc_general function takes a model function, a prediction function, input data for estimating the model, and an optional vector of time data associated with the model. The model is estimated once over the entire sample period using the input data and model function. Model parameters are then combined with the input data using the prediction function to generate in-sample forecasts. This is functionally similar to the is_forc() function, but allows for usage of non-linear models such as GLMs, logit regressions, decision trees, or any custom model.

For all observations i in the sample, coefficients are estimated as:

\[ Y_i = \text{model_function}(X_i) \qquad \text{for all } i \] And forecasts are estimated as:

\[ forecast_i = \text{prediction_function}(\text{model_function}, X_i) \qquad \]

is_forc_general(
  model_function = function(data) {glm(y ~ x1 + x2, data = data, family = binomial)},
  prediction_function = function(model_function, data) {as.vector(predict(model_function, data, type = "response"))}, 
  data = dataLogit,
  realized = dataLogit$y,
  time_vec = dataLogit$date
)
#> h_ahead = 0 
#> 
#>        origin     future   forecast realized
#> 1  2010-03-31 2010-03-31 0.99229531        1
#> 2  2010-06-30 2010-06-30 0.22353367        0
#> 3  2010-09-30 2010-09-30 0.26932349        0
#> 4  2010-12-31 2010-12-31 0.13692703        0
#> 5  2011-03-31 2011-03-31 0.96280428        1
#> 6  2011-06-30 2011-06-30 0.16711236        1
#> 7  2011-09-30 2011-09-30 0.05650550        0
#> 8  2011-12-31 2011-12-31 0.03098593        0
#> 9  2012-03-31 2012-03-31 0.24950963        0
#> 10 2012-06-30 2012-06-30 0.90240710        1
#> 11 2012-09-30 2012-09-30 0.24889484        0
#> 12 2012-12-31 2012-12-31 0.22353367        0
#> 13 2013-03-31 2013-03-31 0.26932349        0
#> 14 2013-06-30 2013-06-30 0.13692703        1
#> 15 2013-09-30 2013-09-30 0.96280428        1
#> 16 2013-12-31 2013-12-31 0.16711236        0

Note that because we are creating an in-sample forecast, h_ahead is set to 0 and the origin time equals the future time. This test evaluates how well the model fits the historical data.

oos_realized_forc

The oos_realized_forc() function produces an h period ahead out-of-sample forecast that is conditioned on realized values. The function takes a linear model call, an integer number of periods ahead to forecast, a period to end the initial coefficient estimation and begin forecasting, an optional vector of time data associated with the linear model, and an optional integer number of past periods to estimate the linear model over. The linear model is originally estimated with data up to estimation_end minus the number of periods specified in the estimation_window argument. For instance, if the linear model is being estimated on quarterly data and the estimation_window is set to 20L, coefficients will be estimated using five years of data up to estimation_end. If estimation_window is set to NULL then the linear model is estimated with all available data up to estimation_end. Coefficients are multiplied by realized values of the covariates h_ahead periods ahead to create an h_ahead period ahead forecast. This process is iteratively repeated for each period after estimation_end with coefficients updating in each period as more information would have become available to the forecaster. In each period, coefficients are updated based on all available information if estimation_window is set to NULL, or a rolling window of past periods if estimation_window is set to an integer value.

In the sample i, for each period p greater than or equal to estimation_end, coefficients are updated as:

\[ Y_i = \beta_0 + \beta_1 X1_i + \beta_2 X2_i + \epsilon_i \qquad \text{for all} \quad p-w \leq i \leq \text{p} \] Where w is the estimation_window. h_ahead h forecasts are estimated as:

\[ forecast_{p+h} = \beta_0 + \beta_1 X1_{p+h} + \beta_2 X2_{p+h} \qquad \]

oos_realized_forc(
  lm_call = lm(y ~ x1 + x2, data),
  h_ahead = 2L,
  estimation_end = as.Date("2011-03-31"),
  time_vec = data$date,
  estimation_window = NULL,
  return_betas = FALSE
)
#> h_ahead = 2 
#> 
#>       origin     future forecast realized
#> 1 2011-03-31 2011-09-30 1.623750     2.89
#> 2 2011-06-30 2011-12-31 2.341664     2.11
#> 3 2011-09-30 2012-03-31 3.415198     2.97
#> 4 2011-12-31 2012-06-30 2.708308     0.99

Note that the oos_realized_forc function returns an out-of-sample forecast that conditions on realized values that would not have been available to the forecaster at the forecast origin. This test evaluates the performance of a linear forecast model had it been conditioned on perfect information.

oos_realized_forc_general

The oos_realized_forc_general() function produces an h period ahead out-of-sample forecast that is conditioned on realized values. The function takes a model function, a prediction function, input data for estimating the model, realized values of the dependent variable, an integer number of periods ahead to forecast, a period to end the initial coefficient estimation and begin forecasting, a vector of time data associated with the model, and an optional integer number of past periods to estimate the model over. The model is originally estimated using the input data and model function with data up to estimation_end minus the the number of periods specified in estimation_window. If estimation_window is left NULL then the model is estimated with all available data up to estimation_end. Model parameters are then combined with realized values of the input data h_ahead periods ahead to generate an h_ahead period ahead forecast. This process is iteratively repeated for each period after estimation_end with model parameters updating in each period.

In the sample i, for each period p greater than or equal to estimation_end, coefficients are updated as:

\[ Y_i = \text{model_function}(X_i) \qquad \text{for all} \quad p-w \leq i \leq \text{p} \] Where X is the input data and w is the estimation_window. h_ahead h forecasts are estimated as:

\[ forecast_{p+h} = \text{prediction_function}(\text{model_function}, X_{p+h}) \qquad \]

forc <- oos_realized_forc_general(
    model_function = function(data) {glm(y ~ x1 + x2, data = data, family = binomial)},
    prediction_function = function(model_function, data) {
      as.vector(predict(model_function, data, type = "response"))
    }, 
    data = dataLogit,
    realized = dataLogit$y,
    h_ahead = 2L,
    estimation_end = as.Date("2012-06-30"),
    time_vec = dataLogit$date,
    estimation_window = NULL
)

Note that the oos_realized_forc_general function returns an out-of-sample forecast that conditions on realized values that would not have been available to the forecaster at the forecast origin. This test evaluates the performance of a forecasting model had it been conditioned on perfect information.

oos_vintage_forc

The oos_vintage_forc() function produces an out-of-sample forecast conditioned on h period ahead forecasts of the linear model covariates. The function takes a linear model call, a vector of time data associated with the linear model, a vintage forecast for each covariate in the linear model, and an optional integer number of past periods to estimate the linear model over. For each period in the vintage forecasts, coefficients are updated based on information that would have been available to the forecaster at the forecast origin. Coefficients are estimated over information from the last estimation_window number of periods. If estimation_window is left NULL then coefficients are estimated over all of the information that would have been available to the forecaster. Coefficients are then multiplied by vintage forecast values to produce a replication of real time forecasts.

In the sample i, for each period p in the vintage forecasts VF1 and VF2, coefficients are updated as:

\[ Y_i = \beta_0 + \beta_1 X1_i + \beta_2 X2_i + \epsilon_i \qquad \text{for all} \quad p-w \leq i \leq \text{p} \]

And h_ahead h forecasts are estimated as:

\[ forecast_{p+h} = \beta_0 + \beta_1 VF1_p + \beta_2 VF2_p \qquad \]

We introduce stylized vintage forecasts of X1 and X2 to demonstrate the oos_vintage_forc() function. Using four quarter ahead forecasts of the covariates X1 and X2, we create an out-of-sample forecast based on the coefficients and covariate forecasts that the forecaster would have used in each period.

x1_forecast_vintage <- Forecast(
   origin   = as.Date(c("2010-09-30", "2010-12-31", "2011-03-31", "2011-06-30")),
   future   = as.Date(c("2011-09-30", "2011-12-31", "2012-03-31", "2012-06-30")),
   forecast = c(6.30, 4.17, 5.30, 4.84),
   realized = c(4.92, 5.80, 6.30, 4.17),
   h_ahead  = 4L
)

x2_forecast_vintage <- Forecast(
   origin   = as.Date(c("2010-09-30", "2010-12-31", "2011-03-31", "2011-06-30")),
   future   = as.Date(c("2011-09-30", "2011-12-31", "2012-03-31", "2012-06-30")),
   forecast = c(7.32, 6.88, 6.82, 6.95),
   realized = c(8.68, 9.91, 7.87, 6.63),
   h_ahead  = 4L
)

oos_vintage_forc(
  lm_call = lm(y ~ x1 + x2, data),
  time_vec = data$date,
  x1_forecast_vintage, x2_forecast_vintage,
  estimation_window = NULL,
  return_betas = FALSE
)
#> h_ahead = 4 
#> 
#>       origin     future  forecast realized
#> 1 2010-09-30 2011-09-30 -2.497310     2.89
#> 2 2010-12-31 2011-12-31  1.194088     2.11
#> 3 2011-03-31 2012-03-31  1.620716     2.97
#> 4 2011-06-30 2012-06-30  1.470027     0.99

This test replicates the forecasts that a linear model conditional on forecasts of covariates would have produced in real-time. Here we see the strength of the Forecast class. Because the vintage forecasts of X1 and X2 share the same data structure, we can calculate a forecast that is conditional on these objects without fearing inconsistency across inputs.

oos_vintage_forc_general

The oos_vintage_forc_general() function produces an out-of-sample forecast conditioned on h period ahead forecasts of the model parameters. The function takes a model function, prediction function, input data for estimating the model, realized values of the dependent variable, a vector of time data associated with the model, a forecast for each parameter in the model, and an optional integer number of past periods to estimate the model over. For each period in the vintage forecasts, model parameters are estimated with data up to the current period minus the number of periods specified in estimation_window. If estimation_window is left NULL then the model is estimated with all available data up to the current period. Model parameters are then combined with vintage forecast values to generate a forecast. Note that data input to the prediction_function takes the form of a data.frame with a number of columns equal to the number of input vintage forecasts and the prediction_function needs to be able to take this input format and generate a forecast based on it. Returns an out-of-sample forecast conditional on vintage forecasts that would have been available at the forecast origin. Replicates the forecasts that a conditional forecasting model would have produced in real time.

In the sample i, for each period p in the vintage forecasts VF1 and VF2, coefficients are updated as:

\[ Y_i = \text{model_function}(X_i) \qquad \text{for all} \quad p-w \leq i \leq \text{p} \]

Where X is the input data and w is the estimation_window. h_ahead h forecasts are estimated as:

\[ forecast_{p+h} = \beta_0 + \beta_1 VF1_p + \beta_2 VF2_p \qquad forecast_{p+h} = \text{prediction_function}(\text{model_function}, [VF1_p, VF2_p]) \qquad \]

We introduce stylized vintage forecasts of X1 and X2 to demonstrate the oos_vintage_forc() function. Using four quarter ahead forecasts of the covariates X1 and X2, we create an out-of-sample forecast based on the coefficients and covariate forecasts that the forecaster would have used in each period.

x1_forecast_vintageLogit <- Forecast(
   origin   = as.Date(c("2012-09-30", "2012-12-31", "2013-03-31", "2013-06-30")),
   future   = as.Date(c("2013-09-30", "2013-12-31", "2014-03-31", "2014-06-30")),
   forecast = c(6.34, 4.17, 2.98, 1.84),
   realized = c(5.88, 3.34, 2.92, 1.80),
   h_ahead  = 4L
)

x2_forecast_vintageLogit <- Forecast(
   origin   = as.Date(c("2012-09-30", "2012-12-31", "2013-03-31", "2013-06-30")),
   future   = as.Date(c("2013-09-30", "2013-12-31", "2014-03-31", "2014-06-30")),
   forecast = c(7.32, 3.22, 2.21, 2.65),
   realized = c(6.09, 2.91, 1.68, 2.91),
   h_ahead  = 4L
)

oos_vintage_forc_general(
    model_function = function(data) {glm(y ~ x1 + x2, data = data, family = binomial)},
    prediction_function = function(model_function, data) {
        names(data) <- c("x1", "x2")
        as.vector(predict(model_function, data, type = "response"))
    }, 
    data = dataLogit,
    realized = dataLogit$y,
    time_vec = dataLogit$date,
    x1_forecast_vintageLogit, x2_forecast_vintageLogit,
    estimation_window = NULL
)
#> 
#>  Note* the data argument passed to prediction_function has the following form:
#>     X1   X2
#> 1 6.34 7.32
#> 
#> h_ahead = 4 
#> 
#>       origin     future   forecast realized
#> 1 2012-09-30 2013-09-30 0.99460268        1
#> 2 2012-12-31 2013-12-31 0.33526949        0
#> 3 2013-03-31 2014-03-31 0.02961128       NA
#> 4 2013-06-30 2014-06-30 0.03721942       NA

Note that in the example above the prediction_function is adapted to take a specific data.frame as input and generate a forecast based on it. The form of this specific data.frame is printed to the console for reference. This test replicates the forecasts that a model conditional on forecasts of the model parameters would have produced in real-time. Here we see the strength of the Forecast class. Because the vintage forecasts of X1 and X2 share the same data structure, we can calculate a forecast that is conditional on these objects without fearing inconsistency across inputs.

conditional_forc

The conditional_forc() function produces a forecast conditioned on forecasts of the linear model covariates. The function takes a linear model call, a vector of time data associated with the linear model, and a forecast for each covariate in the linear model. The linear model is estimated once over the entire sample period and the coefficients are multiplied by the forecasts of each covariate.

For all observations i in the sample, coefficients are estimated as:

\[ Y_i = \beta_0 + \beta_1 X1_i + \beta_2 X2_i + \epsilon_i \qquad \text{for all } i \]

And for all periods p in the covariate forecasts F1 and F2, forecasts are estimated as:

\[ forecast_{p+h} = \beta_0 + \beta_1 F1_p + \beta_2 F2_p \qquad \]

The difference between conditional_forc() and oos_vintage_forc() is that in the conditional_forc() function coefficients are only estimated once over all observations. Coefficients do not update based on what information would have been available to the forecaster at any given point in time. We introduce stylized forecasts of X1 and X2 to demonstrate the conditional_forc() function. Because in this example we are making a conditional forecast for the future instead of testing past forecasts, we can condition on a horizon of forecasts. This is in contrast to the oos_vintage_forc() example where we test the performance of four quarter ahead vintage forecasts.

x1_forecast <- Forecast(
   origin   = as.Date(c("2012-06-30", "2012-06-30", "2012-06-30", "2012-06-30")),
   future   = as.Date(c("2012-09-30", "2012-12-31", "2013-03-31", "2013-06-30")),
   forecast = c(4.14, 4.04, 4.97, 5.12),
   realized = NULL,
   h_ahead  = NULL
)

x2_forecast <- Forecast(
   origin   = as.Date(c("2012-06-30", "2012-06-30", "2012-06-30", "2012-06-30")),
   future   = as.Date(c("2012-09-30", "2012-12-31", "2013-03-31", "2013-06-30")),
   forecast = c(6.01, 6.05, 6.55, 7.45),
   realized = NULL,
   h_ahead  = NULL
)

conditional_forc(
  lm_call = lm(y ~ x1 + x2, data),
  time_vec = data$date,
  x1_forecast, x2_forecast
)
#> h_ahead = 
#> 
#>       origin     future forecast realized
#> 1 2012-06-30 2012-09-30 1.368054       NA
#> 2 2012-06-30 2012-12-31 1.297686       NA
#> 3 2012-06-30 2013-03-31 1.945655       NA
#> 4 2012-06-30 2013-06-30 2.044105       NA

This function is used to create a forecast for the present period or replicate a forecast made at a specific period in the past. Note that because we are forecasting into the future, realized is NULL. Also, because we are forecasting a horizon of dates, h_ahead is NULL.

conditional_forc_general

The conditional_forc_general() function produces a forecast conditioned on forecasts of the model parameters. The function takes a model function, prediction function, input data for estimating the model, a vector of time data associated with the model, and forecasts for each parameter in the model. The model is estimated once over the entire sample period and the model parameters are combined with the forecasts of each parameter to generate a forecast.

For all observations i in the sample, coefficients are estimated as:

\[ Y_i = \text{model_function}(X_i) \qquad \text{for all } i \]

And for all periods p in the covariate forecasts F1 and F2, forecasts are estimated as:

\[ forecast_{p+h} = \text{prediction_function}(\text{model_function}, [VF1_p, VF2_p]) \qquad \]

The difference between conditional_forc_general() and oos_vintage_forc_general() is that in the conditional_forc_general() function coefficients are only estimated once over all observations. Coefficients do not update based on what information would have been available to the forecaster at any given point in time. We introduce stylized forecasts of X1 and X2 to demonstrate the conditional_forc_general() function. Because in this example we are making a conditional forecast for the future instead of testing past forecasts, we can condition on a horizon of forecasts. This is in contrast to the oos_vintage_forc_general() example where we test the performance of four quarter ahead vintage forecasts.

# Parameter Forecasts.
x1_forecastLogit <- Forecast(
   origin   = as.Date(c("2013-12-31", "2013-12-31", "2013-12-31", "2013-12-31")),
   future   = as.Date(c("2014-03-31", "2014-06-30", "2014-09-30", "2014-12-31")),
   forecast = c(2.11, 6.11, 6.75, 4.30),
   realized = NULL,
   h_ahead  = NULL
)

x2_forecastLogit <- Forecast(
   origin   = as.Date(c("2013-12-31", "2013-12-31", "2013-12-31", "2013-12-31")),
   future   = as.Date(c("2014-03-31", "2014-06-30", "2014-09-30", "2014-12-31")),
   forecast = c(1.98, 7.44, 7.86, 5.98),
   realized = NULL,
   h_ahead  = NULL
)

conditional_forc_general(
    model_function = function(data) {glm(y ~ x1 + x2, data = data, family = binomial)},
    prediction_function = function(model_function, data) {
        names(data) <- c("x1", "x2")
        as.vector(predict(model_function, data, type = "response"))
    }, 
    data = dataLogit,
    time_vec = dataLogit$date,
    x1_forecastLogit, x2_forecastLogit
)
#> 
#>  Note* the data argument passed to prediction_function has the following form:
#>     X1   X2
#> 1 2.11 1.98
#> 2 6.11 7.44
#> 3 6.75 7.86
#> 4 4.30 5.98
#> 
#> h_ahead = 
#> 
#>       origin     future   forecast realized
#> 1 2013-12-31 2014-03-31 0.02637805       NA
#> 2 2013-12-31 2014-06-30 0.98668135       NA
#> 3 2013-12-31 2014-09-30 0.99508344       NA
#> 4 2013-12-31 2014-12-31 0.78686161       NA

This function is used to create a forecast for the present period or replicate a forecast made at a specific period in the past. Note that because we are forecasting into the future, realized is NULL. Also, because we are forecasting a horizon of dates, h_ahead is NULL.

oos_lag_forc

The oos_lag_forc() function produces an h period ahead out-of-sample forecast conditioned on present period values. The function takes a linear model call, an integer number of periods ahead to forecast, a period to end the initial coefficient estimation and begin forecasting, an optional vector of time data associated with the linear model, and an optional integer number of past periods to estimate the linear model over. The linear model data is lagged by h_ahead periods and the linear model is re-estimated with data up to estimation_end minus the number of periods specified in the estimation_window argument to create a lagged linear model. If estimation_window is left NULL then the linear model is estimated with all available lagged data up to estimation_end. Coefficients are multiplied by present period realized values of the covariates to create a forecast for h_ahead periods ahead. This process is iteratively repeated for each period after estimation_end with coefficients updating in each period as more information would have become available to the forecaster. In each period, coefficients are updated based on all available information if estimation_window is set to NULL, or a rolling window of past periods if estimation_window is set to an integer value.

In the sample i, for each period p greater than or equal to estimation_end, coefficients are updated as:

\[ Y_i = \beta_0 + \beta_1 X1_{i-h} + \beta_2 X2_{i-h} + \epsilon_i \qquad \text{for all} \quad p-w \leq i \leq \text{p} \] Where w is the estimation_window and h is h_ahead. h_ahead forecasts are estimated as:

\[ forecast_{p+h} = \beta_0 + \beta_1 X1_{p} + \beta_2 X2_{p} \qquad \]

oos_lag_forc(
  lm_call = lm(y ~ x1 + x2, data),
  h_ahead = 2L,
  estimation_end = as.Date("2011-03-31"),
  time_vec = data$date,
  estimation_window = NULL,
  return_betas = FALSE
)
#> h_ahead = 2 
#> 
#>       origin     future  forecast realized
#> 1 2011-03-31 2011-09-30 -2.100528     2.89
#> 2 2011-06-30 2011-12-31  2.174392     2.11
#> 3 2011-09-30 2012-03-31  2.813745     2.97
#> 4 2011-12-31 2012-06-30  1.807014     0.99

This test evaluates the performance of a lagged linear model had it been conditioned on present values that would have been available to the forecaster at the forecast origin. This is in contrast to conditioning on realized values or a forecast of the covariates.

historical_average_forc

The historical_average_forc() function produces an h period ahead forecast based on the historical average of the series that is being forecasted. The function takes an average function, a vector of realized values, an integer number of periods ahead to forecast, a period to end the initial average estimation and begin forecasting, an optional vector of time data associated with the realized values, and an optional integer number of past periods to estimate the average over. The historical average is originally calculated with realized values up to estimation_end minus the number of periods specified in estimation_window. If estimation_window is left NULL then the historical average is calculated with all available realized values up to estimation_end. In each period the historical average is set as the h_ahead period ahead forecast. This process is iteratively repeated for each period after estimation_end with the historical average updating in each period as more information would have become available to the forecaster.

If avg_function is set to mean, in the sample i, for each period p greater than or equal to estimation_end, h_ahead h forecasts are calculated as:

\[ forecast_{p+h} = \frac{1}{p-w} \sum_{i=p-w}^{p}{Y_i} \qquad \]

Where Y is the series being forecasted and w is the estimation_window.

historical_average_forc(
  avg_function = "mean",
  realized_vec = data$y,
  h_ahead = 2L,
  estimation_end = as.Date("2011-03-31"),
  time_vec = data$date,
  estimation_window = 4L
)
#> h_ahead = 2 
#> 
#>       origin     future forecast realized
#> 1 2011-03-31 2011-09-30    1.626     2.89
#> 2 2011-06-30 2011-12-31    1.678     2.11
#> 3 2011-09-30 2012-03-31    1.914     2.97
#> 4 2011-12-31 2012-06-30    2.118     0.99

historical_average_forc() returns a historical average forecast where the h_ahead period ahead forecast is simply the historical average or rolling window average of the series being forecasted. This replicates the historical average forecast that would have been produced in real-time and can serve as a benchmark for other forecasting models.

random_walk_forc

The random_walk_forc() function produces an h period ahead forecast based on the last realized value in the series that is being forecasted. The function takes a vector of realized values, an integer number of periods ahead to forecast, and an optional vector of time data associated with the realized values. In each period, the current period value of the realized_vec series is set as the h_ahead period ahead forecast.

In the sample i, for each period p, h_ahead h forecasts are calculated as:

\[ forecast_{p+h} = Y_p \]

Where Y is the series being forecasted.

random_walk_forc(
  realized_vec = data$y,
  h_ahead = 6L,
  time_vec = data$date 
)
#> h_ahead = 6 
#> 
#>       origin     future forecast realized
#> 1 2010-03-31 2011-09-30     1.09     2.89
#> 2 2010-06-30 2011-12-31     1.71     2.11
#> 3 2010-09-30 2012-03-31     1.09     2.97
#> 4 2010-12-31 2012-06-30     2.46     0.99

random_walk_forc() returns a random walk forecast where the h_ahead period ahead forecast is simply the present value of the series being forecasted. This replicates the random walk forecast that a forecaster would have produced in real-time and can serve as a benchmark for other forecasting models.

autoreg_forc

The autoreg_forc() function produces an h period ahead forecast based on an autoregressive (AR) model. The function takes a vector of realized values, an integer number of periods ahead to forecast, an integer number of lags to include in the autoregressive model, a period to end the initial model estimation and begin forecasting, an optional vector of time data associated with the realized values, and an optional integer number of past periods to estimate the autoregressive model over. An AR(ar_lags) autoregressive model is originally estimated with realized values up to estimation_end minus the number of periods specified in estimation_window. If estimation_window is left NULL then the autoregressive model is estimated with all realized values up to estimation_end. The AR(ar_lags) model is estimated by regressing the vector of realized values on vectors of the same realized values that have been lagged by one to ar_lags steps. The AR coefficients of this model are multiplied by lagged values and the present period realized value to create a one period ahead forecast. If h_ahead is greater than one, the one period ahead forecasting process is iteratively repeated so that the two period ahead forecast conditions on the one period ahead forecasted value. This process of rolling one period ahead forecasts forward continues until an h_ahead forecast is obtained. The h_ahead forecasting process is repeated for each period after estimation_end with AR model coefficients updating as more information would have become available to the forecaster. In each period, coefficients are updated based on all available realized values if estimation_window is set to NULL, or a rolling window of past periods if estimation_window is set to an integer value.

In the sample i with ar_lags set to two and h_ahead set to two. For each period p greater than or equal to estimation_end, coefficients are calculated as:

\[ Y_i = \beta_0 + \beta_1 Y_{i-1} + \beta_2 Y_{i-2} \qquad \text{for all} \quad p-w \leq i \leq \text{p} \] Where Y is the series being forecasted and w is the estimation_window. h_ahead two step ahead forecasts are estimated as:

\[ Y_{p+1} = \beta_0 + \beta_1 Y_p + \beta_2 Y_{p-1} \\ forecast_{p+h} = Y_{p+2} = \beta_0 + \beta_1 Y_{p+1} + \beta_2 Y_{p} \]

autoreg_forc(
  realized_vec = data$y,
  h_ahead = 2L,
  ar_lags = 2L,
  estimation_end = as.Date("2011-06-30"),
  time_vec = data$date,
  estimation_window = NULL,
  return_betas = FALSE
)
#> h_ahead = 2 
#> 
#>       origin     future forecast realized
#> 1 2011-06-30 2011-12-31 1.649380     2.11
#> 2 2011-09-30 2012-03-31 2.376138     2.97
#> 3 2011-12-31 2012-06-30 1.944882     0.99

autoreg_forc() returns an autoregressive forecast based on information that would have been available at the forecast origin. This function replicates the forecasts than an AR model would have produced in real-time and can serve as a benchmark for other forecasting models.

performance_weighted_forc

The performance_weighted_forc() function produces a weighted average of multiple forecasts based on the recent performance of each forecast. The function takes two or more forecasts of the Forecast class, an evaluation window, and an error function. For each forecast period, the error function is used to calculate forecast accuracy over the past eval_window number of periods. The forecast accuracy of each forecast is then used to weight forecasts based on a weighting function. In each period, weights are calculated and used to create a weighted average forecast.

We use a stylized example in which we create a weighted forecast of two forecasts: Y1 and Y2.

For all periods p in the k number of forecasts Y, weights W are calculated over the eval_window e as:

\[ W_k = \frac{1/MSE(Y_{ki})}{1/\sum_{k=1}^{k}MSE(Y_{ki})} \qquad \text{where} \quad i = p-e \leq i \leq p \]

Forecasts are estimated as:

\[ forecast_p = Y1_pW_1 + Y2_pW_2 \]

y1_forecast <- Forecast(
  origin = as.Date(c("2009-03-31", "2009-06-30", "2009-09-30", "2009-12-31",
                     "2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31", 
                     "2011-03-31", "2011-06-30")),
  future = as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
                     "2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31", 
                     "2012-03-31", "2012-06-30")),
  forecast = c(1.33, 1.36, 1.38, 1.68, 1.60, 1.55, 1.32, 1.22, 1.08, 0.88),
  realized = c(1.09, 1.71, 1.09, 2.46, 1.78, 1.35, 2.89, 2.11, 2.97, 0.99),
  h_ahead = 4L
)

y2_forecast <- Forecast(
  origin = as.Date(c("2009-03-31", "2009-06-30", "2009-09-30", "2009-12-31",
                     "2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31", 
                     "2011-03-31", "2011-06-30")),
  future = as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
                     "2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31", 
                     "2012-03-31", "2012-06-30")),
  forecast = c(0.70, 0.88, 1.03, 1.05, 1.01, 0.82, 0.95, 1.09, 1.07, 1.06),
  realized = c(1.09, 1.71, 1.09, 2.46, 1.78, 1.35, 2.89, 2.11, 2.97, 0.99),
  h_ahead = 4L
)

performance_weighted_forc(
  y1_forecast, y2_forecast,
  eval_window = 2L,
  errors = "mse",
  return_weights = FALSE
)
#> h_ahead = 4 
#> 
#>        origin     future forecast realized
#> 1  2009-03-31 2010-03-31       NA     1.09
#> 2  2009-06-30 2010-06-30       NA     1.71
#> 3  2009-09-30 2010-09-30       NA     1.09
#> 4  2009-12-31 2010-12-31       NA     2.46
#> 5  2010-03-31 2011-03-31       NA     1.78
#> 6  2010-06-30 2011-06-30 1.421244     1.35
#> 7  2010-09-30 2011-09-30 1.234979     2.89
#> 8  2010-12-31 2011-12-31 1.186461     2.11
#> 9  2011-03-31 2012-03-31 1.078011     2.97
#> 10 2011-06-30 2012-06-30 0.893773     0.99

The performance_weighted_forc() function returns a weighted forecast of the Y1 and Y2 forecasts based on performance in recent periods. The weights used in each period can be returned to the Global Environment by setting return_weights to TRUE. Note that although we were only weighting performance over the past two periods, we have five NA forecasts. This reflects the lmForc philosophy of replicating what it would be like to forecast in real-time. If a forecaster was making a forecast at 2010-06-30, they would only have access to realized values up to 2010-06-30, in this case the first two rows. This is why a weighted forecast with an eval_window of two can only be computed once the forecast origin becomes 2010-06-30 and the forecaster has access to two realized values. This function can be used to compute a weighted forecast for the present period or to test how a weighted forecast would have performed historically. The weighted forecasts are based on information that would have been available to the forecaster at the forecast origin.

states_weighted_forc

The states_weighted_forc() function produces a weighted average of multiple forecasts based on how each forecast performed during the past state of the world that is most similar to the current state of the world. The function takes two or more forecasts, a data frame, matrix, or array of matching variables, an optional vector of time data associated with the matching variables, a matching window size, a matching function, and an error function. The first step of the weighted forecast process is to match the current state of the world to a similar past state of the world. For each forecast period, the matching_vars are standardized and the past matching_window periods of the matching variables are considered as the current state of the world. This current state of the world is compared to all past matching_window size periods of the matching variables. The current state is matched to the past state that minimizes the user selected matching function. For example, if matching is set to euclidean then the matched past state is the past state which has the minimum euclidean distance to the current state of the world. The objective is to select a past period that is similar to the current state of the world as given by the matching variables. Once a past state has been matched, the accuracy of each forecast is calculated over the periods of the past state according to the user selected error function. Forecast weights are then computed based on forecast accuracy during the past state. The objective is to give more weight to the forecasts that perform better in conditions that reflect the current state of the world. The forecast weights are then used to create a weighted forecast for the current period.

We use a stylized example with one matching variable x as well as two forecasts Y1 and Y2.

The matching variable x is first standardized using the function:

\[ x_i = \frac{x_i - mean(x)}{sd(x)} \]

For all periods p, the current state of the world c and past states of the world p are calculated as:

\[ c_i = x_i \qquad \text{where} \quad p-w \leq i \leq p \\ \qquad \qquad \qquad \qquad p_{id} = x_i \qquad \text{where} \quad d-w \leq i \leq d \quad \text{for all} \quad d \lt p-w \]

Where w is the matching_window size and d are all periods that occur before the beginning of the current state.

All possible past states are passed through the matching function and the matched past state is selected as the past state that minimizes the matching function. If matching is set to euclidean, the matched past state p is the past state that satisfies the following:

\[ p = \min \sqrt{\sum_{i=1}^{n}(c_i - p_{id})^2} \qquad \text{for all past states } d \]

Forecast accuracy and forecast weights are computed over observations from the matched past state p. If errors is set to mse then the forecast weights W for each forecast k are calculated as.

\[ W_k = \frac{1/MSE(Y_{kp})}{1/\sum_{k=1}^{k}MSE(Y_{kp})} \]

The current period forecast is then calculated as:

\[ forecast_p = Y1_pW_1 + Y2_pW_2 \]

date <- as.Date(c("2010-03-31", "2010-06-30", "2010-09-30", "2010-12-31",
                  "2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31",
                  "2012-03-31", "2012-06-30"))

future <- as.Date(c("2011-03-31", "2011-06-30", "2011-09-30", "2011-12-31",
                    "2012-03-31", "2012-06-30", "2012-09-30", "2012-12-31",
                    "2013-03-31", "2013-06-30"))

y  <- c(1.09, 1.71, 1.09, 2.46, 1.78, 1.35, 2.89, 2.11, 2.97, 0.99)
x1 <- c(4.22, 3.86, 4.27, 5.60, 5.11, 4.31, 4.92, 5.80, 6.30, 4.17)
x2 <- c(10.03, 10.49, 10.85, 10.47, 9.09, 10.91, 8.68, 9.91, 7.87, 6.63)

data <- data.frame(date, y, x1, x2)
matching_vars <- data[, c("x1", "x2")]

y1_forecast <- Forecast(
  origin = date,
  future = future,
  forecast = c(1.33, 1.36, 1.38, 1.68, 1.60, 1.55, 1.32, 1.22, 1.08, 0.88),
  realized = c(1.78, 1.35, 2.89, 2.11, 2.97, 0.99, 1.31, 1.41, 1.02, 1.05),
  h_ahead = 4L
)

y2_forecast <- Forecast(
  origin = date,
  future = future,
  forecast = c(0.70, 0.88, 1.03, 1.05, 1.01, 0.82, 0.95, 1.09, 1.07, 1.06),
  realized = c(1.78, 1.35, 2.89, 2.11, 2.97, 0.99, 1.31, 1.41, 1.02, 1.05),
  h_ahead = 4L
)

states_weighted_forc(
  y1_forecast, y2_forecast,
  matching_vars = matching_vars,
  time_vec = data$date,
  matching_window = 2L,
  matching = "euclidean",
  errors = "mse",
  return_weights = FALSE
)
#> h_ahead = 4 
#> 
#>        origin     future forecast realized
#> 1  2010-03-31 2011-03-31       NA     1.78
#> 2  2010-06-30 2011-06-30       NA     1.35
#> 3  2010-09-30 2011-09-30       NA     2.89
#> 4  2010-12-31 2011-12-31       NA     2.11
#> 5  2011-03-31 2012-03-31       NA     2.97
#> 6  2011-06-30 2012-06-30 1.456977     0.99
#> 7  2011-09-30 2012-09-30 1.211438     1.31
#> 8  2011-12-31 2012-12-31 1.174534     1.41
#> 9  2012-03-31 2013-03-31 1.077066     1.02
#> 10 2012-06-30 2013-06-30 0.932814     1.05

The states_weighted_forc() function returns a weighted forecast of the Y1 and Y2 forecasts based on how these forecasts performed in past states of the world that resemble the current state of the world. The weights used in each period and a list of the matched states can be returned to the Global Environment by setting return_weights to TRUE. This function can be used to compute a states weighted forecast for the present period or to test how a states weighted forecast would have performed historically. The states weighted forecasts are based on information that would have been available to the forecaster at the forecast origin.



Subsetting Functions Overview

The Forecast class comes with a built in method for subsetting a single Forecast object. This subsetting method takes numeric or logical values and follows subsetting conventions that are present throughout the R language.

forc[2:4]
forc[origin(forc1_1h) >= as.Date("2010-12-31")]

However, one often ends up working with multiple Forecast objects. Examples include working with different model forecasts for the same forecast horizon, one model forecast for varying forecast horizons, or both. The lmForc convention for working with multiple Forecast objects is to put them into a list. The following functions provide a way to subset lists of Forecast objects by various conditions.

Subsetting Example Dataset

Examples of lmForc subsetting functions utilize the following stylized dataset. This example dataset contains one-quarter ahead forecasts produced by two different models, forc1 and forc2. Note that both forecasts have identical future, realized, and h_ahead values, but that the origin dates of the last two forecasts differ. This becomes relevant when both forecast models are subset to identical origin values.

forc1_1h <- Forecast(
  origin = as.Date(c("2010-02-17", "2010-05-14", "2010-07-22", "2010-12-05", "2011-03-10")),
  future = as.Date(c("2010-06-30", "2010-09-30", "2010-12-31", "2011-03-31", "2011-06-30")),
  forecast = c(4.27, 3.36, 4.78, 5.45, 5.12),
  realized = c(4.96, 4.17, 4.26, 4.99, 5.38),
  h_ahead = 1
)

forc2_1h <- Forecast(
  origin = as.Date(c("2010-02-17", "2010-05-14", "2010-07-22", "2010-12-22", "2011-03-27")),
  future = as.Date(c("2010-06-30", "2010-09-30", "2010-12-31", "2011-03-31", "2011-06-30")),
  forecast = c(4.01, 3.89, 3.31, 4.33, 4.61),
  realized = c(4.96, 4.17, 4.26, 4.99, 5.38),
  h_ahead = 1
)

subset_forcs

The simplest way to subset multiple Forecast objects is via the subset_forcs() function. This function takes a list of Forecast objects and a numeric or logical index. All forecasts in the list are subset by the numerical or logical values that are passed to the index argument.

For example, a list of Forecast objects can be subset by a condition:
subset_forcs(forcs, origin(forc1_1h) >= as.Date("2010-12-31"))

forcs <- list(forc1_1h, forc2_1h)

subset_forcs(forcs, 2:3)
#> [[1]]
#> h_ahead = 1 
#> 
#>       origin     future forecast realized
#> 1 2010-05-14 2010-09-30     3.36     4.17
#> 2 2010-07-22 2010-12-31     4.78     4.26
#> 
#> [[2]]
#> h_ahead = 1 
#> 
#>       origin     future forecast realized
#> 1 2010-05-14 2010-09-30     3.89     4.17
#> 2 2010-07-22 2010-12-31     3.31     4.26

subset_bytime

One may want to compare forecasts over a specific time horizon. The subset_bytime() function allows the user to subset multiple forecasts based on origin or future values. After using the slot argument to choose whether to subset by origin or future values, the user passes a single time object or vector of time objects to the values argument. All forecasts in the list of Forecast objects are subset by values.

For example, to see all of the forecasts that were made on a specific date:
subset_bytime(forcs, values = as.Date("2010-05-14"), slot = "origin")

forcs <- list(forc1_1h, forc2_1h)
 
subset_bytime(
  forcs, 
  values = as.Date(c("2010-09-30", "2010-12-31", "2011-03-31")), 
  slot = "future"
)
#> [[1]]
#> h_ahead = 1 
#> 
#>       origin     future forecast realized
#> 1 2010-05-14 2010-09-30     3.36     4.17
#> 2 2010-07-22 2010-12-31     4.78     4.26
#> 3 2010-12-05 2011-03-31     5.45     4.99
#> 
#> [[2]]
#> h_ahead = 1 
#> 
#>       origin     future forecast realized
#> 1 2010-05-14 2010-09-30     3.89     4.17
#> 2 2010-07-22 2010-12-31     3.31     4.26
#> 3 2010-12-22 2011-03-31     4.33     4.99

subset_identical

When comparing multiple forecasts, it is imperative that forecast accuracy is compared across an identical time period. This can become complicated if the origin and future values of multiple forecasts do not always align. The subset_identical() function finds all overlapping origin or future values in a list of Forecast objects and subsets each of the forecasts to these overlapping values. This leaves the user with a list of Forecasts that have either identical origin values or identical future values depending on what the user passes to the slot argument.

forcs <- list(forc1_1h, forc2_1h)

subset_identical(forcs, slot = "origin")
#> [[1]]
#> h_ahead = 1 
#> 
#>       origin     future forecast realized
#> 1 2010-02-17 2010-06-30     4.27     4.96
#> 2 2010-05-14 2010-09-30     3.36     4.17
#> 3 2010-07-22 2010-12-31     4.78     4.26
#> 
#> [[2]]
#> h_ahead = 1 
#> 
#>       origin     future forecast realized
#> 1 2010-02-17 2010-06-30     4.01     4.96
#> 2 2010-05-14 2010-09-30     3.89     4.17
#> 3 2010-07-22 2010-12-31     3.31     4.26



Transformation Functions Overview

Forecasts are often produced for multiple h_ahead horizons into the future. For example, a model may produce a 1-quarter ahead, 2-quarter ahead, 3-quarter ahead, and 4-quarter ahead forecast during each quarter of the year. In this example, multiple Forecast objects are needed to capture the forecast made during each quarter. As per lmForc convention, one would work with these forecasts by putting them into a list. When working with list of forecasts for multiple h_ahead horizons into the future, there are two general formats in which the forecasts can be organized. These two formats are: Time Format and h_ahead Format.

Time Format

Time Format consists of a list of Forecast objects where each forecast has homogenous origin or future values. Each Forecast object in the list was made at the same time or contains forecasts for the same future time. However, the h_ahead forecast horizon differs within each Forecast object. Time Format is used to represent forecasts made at a single origin time for multiple h_ahead horizons.

The following is an example of forecasts in Time Format. Each Forecast object represents a set of 1-quarter, 2-quarter, and 3-quarter ahead forecasts made at a single origin time during each quarter of 2010. Note that because each Forecast object contains forecasts for multiple h_ahead horizons, h_ahead is set to NA. We place all of these forecasts into a list of Forecast objects that is in Time Format and assign it to forcs_time_format.


forc1_t1 <- Forecast(
  origin = as.Date(c("2010-02-17", "2010-02-17", "2010-02-17")),
  future = as.Date(c("2010-06-30", "2010-09-30", "2010-12-31")),
  forecast = c(4.27, 3.77, 3.52),
  realized = c(4.96, 4.17, 4.26),
  h_ahead = NA
)

forc1_t2 <- Forecast(
  origin = as.Date(c("2010-05-14", "2010-05-14", "2010-05-14")),
  future = as.Date(c("2010-09-30", "2010-12-31", "2011-03-31")),
  forecast = c(3.36, 3.82, 4.22),
  realized = c(4.17, 4.26, 4.99),
  h_ahead = NA
)

forc1_t3 <- Forecast(
  origin = as.Date(c("2010-07-22", "2010-07-22", "2010-07-22")),
  future = as.Date(c("2010-12-31", "2011-03-31", "2011-06-30")),
  forecast = c(4.78, 4.53, 5.03),
  realized = c(4.26, 4.99, 5.33),
  h_ahead = NA
)

forc1_t4 <- Forecast(
  origin = as.Date(c("2010-12-22", "2010-12-22", "2010-12-22")),
  future = as.Date(c("2011-03-31", "2011-06-30", "2011-09-30")),
  forecast = c(5.45, 4.89, 5.78),
  realized = c(4.99, 5.33, 5.21),
  h_ahead = NA
)

forcs_time_format <- list(forc1_t1, forc1_t2, forc1_t3, forc1_t4)

h_ahead Format

h_ahead Format consists of a list of Forecast objects where each forecast has homogenous h_ahead values but the origin and future values vary. The h_ahead Format is useful for analyzing forecasts at a specific h_ahead horizon. For example, one may want to calculate the forecast accuracy of 4-quarter ahead forecasts only. In this case it is useful to have multiple forecasts arranged by homogenous h_ahead values.

The following is an example of forecasts in h_ahead Format. Each Forecast object represents all of the 1-quarter, 2-quarter, and 3-quarter ahead forecasts made during different quarters of 2010. Note that because each Forecast object has a homogenous h_ahead horizon we can now set h_ahead to the appropriate value. These forecasts are collected into a list of Forecast objects that is in h_ahead Format and assigned to forcs_h_ahead_format.


forc1_1h <- Forecast(
  origin = as.Date(c("2010-02-17", "2010-05-14", "2010-07-22", "2010-12-22")),
  future = as.Date(c("2010-06-30", "2010-09-30", "2010-12-31", "2011-03-31")),
  forecast = c(4.27, 3.36, 4.78, 5.45),
  realized = c(4.96, 4.17, 4.26, 4.99),
  h_ahead = 1
)

forc1_2h <- Forecast(
  origin = as.Date(c("2010-02-17", "2010-05-14", "2010-07-22", "2010-12-22")),
  future = as.Date(c("2010-09-30", "2010-12-31", "2011-03-31", "2011-06-30")),
  forecast = c(3.77, 3.82, 4.53, 4.89),
  realized = c(4.17, 4.26, 4.99, 5.33),
  h_ahead = 2
)

forc1_3h <- Forecast(
  origin = as.Date(c("2010-02-17", "2010-05-14", "2010-07-22", "2010-12-22")),
  future = as.Date(c("2010-12-31", "2011-03-31", "2011-06-30", "2011-09-30")),
  forecast = c(3.52, 4.22, 5.03, 5.78),
  realized = c(4.26, 4.99, 5.33, 5.21),
  h_ahead = 3
)

forcs_h_ahead_format <- list(forc1_1h, forc1_2h, forc1_3h)

convert_bytime

Given a list of forecasts in h_ahead Format, one may want to convert one or multiple of these forecasts into Time Format. The function convert_bytime() takes a list of Forecast objects in h_ahead Format and converts the forecasts made on the time specified in the value and slot arguments into Forecast objects that are in Time Format. Note that because we are converting to Time Format, the h_ahead value in each Forecast object is changed to NA.


convert_bytime(
  forcs_h_ahead_format,
  value = as.Date(c("2010-07-22", "2010-12-22")),
  slot = "origin"
)
#> [[1]]
#> h_ahead = NA 
#> 
#>       origin     future forecast realized
#> 1 2010-07-22 2010-12-31     4.78     4.26
#> 2 2010-07-22 2011-03-31     4.53     4.99
#> 3 2010-07-22 2011-06-30     5.03     5.33
#> 
#> [[2]]
#> h_ahead = NA 
#> 
#>       origin     future forecast realized
#> 1 2010-12-22 2011-03-31     5.45     4.99
#> 2 2010-12-22 2011-06-30     4.89     5.33
#> 3 2010-12-22 2011-09-30     5.78     5.21

transform_bytime

Given a list of forecasts in h_ahead Format one can convert all of the forecasts to Time Format using the transform_bytime() function. This function transforms all Forecast objects in forcs to a list of Time Format Forecast objects that have homogenous origin or future values depending on what the user specifies in the slot argument. The difference between transform_bytime() and convert_bytime() is that transforming automatically converts all forecasts in the list while converting only converts the forecasts specified by the user.


transform_bytime(forcs_h_ahead_format, slot = "origin")
#> [[1]]
#> h_ahead = NA 
#> 
#>       origin     future forecast realized
#> 1 2010-02-17 2010-06-30     4.27     4.96
#> 2 2010-02-17 2010-09-30     3.77     4.17
#> 3 2010-02-17 2010-12-31     3.52     4.26
#> 
#> [[2]]
#> h_ahead = NA 
#> 
#>       origin     future forecast realized
#> 1 2010-05-14 2010-09-30     3.36     4.17
#> 2 2010-05-14 2010-12-31     3.82     4.26
#> 3 2010-05-14 2011-03-31     4.22     4.99
#> 
#> [[3]]
#> h_ahead = NA 
#> 
#>       origin     future forecast realized
#> 1 2010-07-22 2010-12-31     4.78     4.26
#> 2 2010-07-22 2011-03-31     4.53     4.99
#> 3 2010-07-22 2011-06-30     5.03     5.33
#> 
#> [[4]]
#> h_ahead = NA 
#> 
#>       origin     future forecast realized
#> 1 2010-12-22 2011-03-31     5.45     4.99
#> 2 2010-12-22 2011-06-30     4.89     5.33
#> 3 2010-12-22 2011-09-30     5.78     5.21

Note that the output of transform_bytime() above is identical to the list of Forecast objects in forcs_time_format. One can continually transform between Time Format and h_ahead Format without losing information. This is evidenced by the fact that:
transform_bytime(forcs_h_ahead_format, slot = "origin") == forcs_time_format
and
transform_byh(forcs_time_format, h_aheads = c(1, 2, 3)) == forcs_h_ahead_format

convert_byh

The inverse of convert_bytime() is convert_byh(). Given a list of forecasts in Time Format convert_byh() converts one or multiple of these forecasts into h_ahead Format. The functions takes a list of Forecast objects in Time Format and converts the forecasts specified by the index argument into Forecast objects in h_ahead Format. Because forecasts that are in Time Format do not have h_ahead values, the function allows the user to assign h_ahead values to the converted Forecast objects via the h_aheads argument.

convert_byh(forcs_time_format, index = 1:2, h_aheads = c(1, 2))
#> [[1]]
#> h_ahead = 1 
#> 
#>       origin     future forecast realized
#> 1 2010-02-17 2010-06-30     4.27     4.96
#> 2 2010-05-14 2010-09-30     3.36     4.17
#> 3 2010-07-22 2010-12-31     4.78     4.26
#> 4 2010-12-22 2011-03-31     5.45     4.99
#> 
#> [[2]]
#> h_ahead = 2 
#> 
#>       origin     future forecast realized
#> 1 2010-02-17 2010-09-30     3.77     4.17
#> 2 2010-05-14 2010-12-31     3.82     4.26
#> 3 2010-07-22 2011-03-31     4.53     4.99
#> 4 2010-12-22 2011-06-30     4.89     5.33

transform_byh

Given a list of forecasts in Time Format one can convert all of the forecasts to h_ahead Format using the transform_byh() function. This function transforms all Forecast objects in forcs to a list of h_ahead Format Forecast objects that have homogenous h_ahead values. h_ahead values are assigned to each converted Forecast object based on the values passed to the h_aheads argument. The difference between transform_byh() and convert_byh() is that transforming automatically converts all forecasts in the list while converting only converts the forecasts specified by the user.


transform_byh(forcs_time_format, h_aheads = c(1, 2, 3))
#> [[1]]
#> h_ahead = 1 
#> 
#>       origin     future forecast realized
#> 1 2010-02-17 2010-06-30     4.27     4.96
#> 2 2010-05-14 2010-09-30     3.36     4.17
#> 3 2010-07-22 2010-12-31     4.78     4.26
#> 4 2010-12-22 2011-03-31     5.45     4.99
#> 
#> [[2]]
#> h_ahead = 2 
#> 
#>       origin     future forecast realized
#> 1 2010-02-17 2010-09-30     3.77     4.17
#> 2 2010-05-14 2010-12-31     3.82     4.26
#> 3 2010-07-22 2011-03-31     4.53     4.99
#> 4 2010-12-22 2011-06-30     4.89     5.33
#> 
#> [[3]]
#> h_ahead = 3 
#> 
#>       origin     future forecast realized
#> 1 2010-02-17 2010-12-31     3.52     4.26
#> 2 2010-05-14 2011-03-31     4.22     4.99
#> 3 2010-07-22 2011-06-30     5.03     5.33
#> 4 2010-12-22 2011-09-30     5.78     5.21

Note that the output of transform_byh() above is identical to the list of Forecast objects in forcs_h_ahead_format. One can continually transform between Time Format and h_ahead Format without losing information. This is evidenced by the fact that:
transform_byh(forcs_time_format, h_aheads = c(1, 2, 3)) == forcs_h_ahead_format
and
transform_bytime(forcs_h_ahead_format, slot = "origin") == forcs_time_format