Backtested Dynamic Confidence Intervals

This notebook demonstrates how to create an expanding confidence interval using conformal intervals and backtesting. It expands what is demonstrated in the Confidence Intervals Notebook. Requires scalecast>=0.18.1.

We overwrite the static naive intervals produced by scalecast by default with dynamic expanding intervals obtained from backtesting.

See the article.

[1]:
import pandas as pd
import numpy as np
from scalecast.Forecaster import Forecaster
from scalecast.util import (
    metrics,
    backtest_metrics,
    backtest_for_resid_matrix,
    get_backtest_resid_matrix,
    overwrite_forecast_intervals,
)
from scalecast.Pipeline import Pipeline, Transformer, Reverter
import pandas_datareader as pdr
import matplotlib.pyplot as plt
import seaborn as sns
import time
[2]:
val_len = 24
fcst_len = 24
[3]:
housing = pdr.get_data_fred('HOUSTNSA',start='1900-01-01',end='2021-06-01')
housing.head()
[3]:
HOUSTNSA
DATE
1959-01-01 96.2
1959-02-01 99.0
1959-03-01 127.7
1959-04-01 150.8
1959-05-01 152.5

Hold out a test set since scalecast uses its native test set to determine interval widths, therefore overfitting the interval to the test set.

[4]:
starts_sep = housing.iloc[-fcst_len:,0]
starts = housing.iloc[:-fcst_len,0]
[5]:
f = Forecaster(
    y=starts,
    current_dates=starts.index,
    future_dates=fcst_len,
    test_length=val_len,
    validation_length=val_len,
    cis=True,
)

Step 1: Build and fit_predict() Pipeline

[6]:
transformer = Transformer(['DiffTransform'])
reverter = Reverter(['DiffRevert'],transformer)
[7]:
def forecaster(f):
    f.add_ar_terms(100)
    f.add_seasonal_regressors('month')
    f.set_estimator('xgboost')
    f.manual_forecast()
[8]:
pipeline = Pipeline(
    steps = [
        ('Transform',transformer),
        ('Forecast',forecaster),
        ('Revert',reverter)
    ]
)
[9]:
f = pipeline.fit_predict(f)
[10]:
f.plot_test_set();
../../_images/misc_cis-bt_cis-bt_13_0.png

Score the default Interval

[11]:
f.plot(ci=True);
../../_images/misc_cis-bt_cis-bt_15_0.png
[12]:
fig, ax = plt.subplots(figsize=(12,6))
f.plot(ci=True,models='top_1',order_by='TestSetRMSE',ax=ax)
sns.lineplot(
    y = 'HOUSTNSA',
    x = 'DATE',
    data = starts_sep.reset_index(),
    ax = ax,
    label = 'held-out actuals',
    color = 'green',
    alpha = 0.7,
)
plt.xlim(pd.Timestamp('2000-01-01'),pd.Timestamp('2021-12-01'))
plt.title('Forecast with Naive Interval')
plt.show()
../../_images/misc_cis-bt_cis-bt_16_0.png
[13]:
print(
    'All confidence intervals for every step'
    ' are {:.2f} units away from the point predictions.'.format(
        f.history['xgboost']['UpperCI'][0] - f.history['xgboost']['Forecast'][0]
    )
)
print(
    'The interval contains {:.2%} of the actual values'.format(
        np.sum(
            [
                1 if a <= uf and a >= lf else 0
                for a, uf, lf in zip(
                    starts_sep,
                    f.history['xgboost']['UpperCI'],
                    f.history['xgboost']['LowerCI']
                )
            ]
        ) / len(starts_sep)
    )
)
All confidence intervals for every step are 37.01 units away from the point predictions.
The interval contains 100.00% of the actual values

Score default interval

[14]:
metrics.msis(
    a = starts_sep,
    uf = f.history['xgboost']['UpperCI'],
    lf = f.history['xgboost']['LowerCI'],
    obs = f.y,
    m = 12,
)
[14]:
4.026554265078705

Step 2: Backtest Pipeline

  • Iterations need to be at least 20 for 95% intervals.

  • Length of each prediction in the backtest should match our desired forecast length.

[15]:
%%time
backtest_results = backtest_for_resid_matrix(
    f,
    pipeline=pipeline,
    alpha = .05, # default
    jump_back = 1, # default
)
CPU times: total: 1min 40s
Wall time: 28.5 s

Step 3: Build Residual Matrix

  • Result is matrix shaped 20x24, each row a backtest iteration, each column a forecast step, each value a residual.

[16]:
backtest_resid_matrix = get_backtest_resid_matrix(backtest_results)

Residual Analytics

[17]:
pd.options.display.max_columns = None
fig, ax = plt.subplots(figsize=(16,8))
mat = pd.DataFrame(np.abs(backtest_resid_matrix[0]['xgboost']))
sns.heatmap(
    mat.round(1),
    annot = True,
    ax = ax,
    cmap = sns.color_palette("icefire", as_cmap=True)
)
plt.ylabel('Backtest Iteration',size=16)
plt.xlabel('Forecast Step',size = 16)
plt.title('Absolute Residuals from XGBoost Backtest',size=25)
plt.show()
../../_images/misc_cis-bt_cis-bt_25_0.png
[18]:
fig, ax = plt.subplots(1,2,figsize=(16,8))
sns.heatmap(
    pd.DataFrame({'Mean Residuals':mat.mean().round(1)}),
    annot = True,
    cmap = 'cubehelix_r',
    ax = ax[0],
    annot_kws={"fontsize": 16},
)
cbar = ax[0].collections[0].colorbar
cbar.ax.invert_yaxis()
ax[0].set_title('Mean Absolute Residuals',size=20)
ax[0].set_ylabel('Forecast Step',size=15)
ax[0].set_xlabel('')
sns.heatmap(
    pd.DataFrame({'Residuals 95 Percentile':np.percentile(mat, q=95, axis = 0)}),
    annot = True,
    cmap = 'cubehelix_r',
    ax = ax[1],
    annot_kws={"fontsize": 16},
)
cbar = ax[1].collections[0].colorbar
cbar.ax.invert_yaxis()
ax[1].set_title('Absolute Residual 95 Percentiles',size=20)
ax[1].set_ylabel('Forecast Step',size=15)
ax[1].set_xlabel('')
plt.show()
../../_images/misc_cis-bt_cis-bt_26_0.png

Each step of the forecast will be plus/minus the 95 percentile of the absolute residuals (plot on right).

Step 4: Overwrite Naive Interval with Dynamic Interval

[19]:
overwrite_forecast_intervals(f,backtest_resid_matrix=backtest_resid_matrix)
[20]:
f.plot(ci=True);
../../_images/misc_cis-bt_cis-bt_30_0.png
[21]:
fig, ax = plt.subplots(figsize=(12,6))
f.plot(ci=True,models='top_1',order_by='TestSetRMSE',ax=ax)
sns.lineplot(
    y = 'HOUSTNSA',
    x = 'DATE',
    data = starts_sep.reset_index(),
    ax = ax,
    label = 'held-out actuals',
    color = 'green',
    alpha = 0.7,
)
plt.xlim(pd.Timestamp('2000-01-01'),pd.Timestamp('2021-12-01'))
plt.title('Forecast with Dynamic Interval')
plt.show()
../../_images/misc_cis-bt_cis-bt_31_0.png

Score dynamic interval

[22]:
metrics.msis(
    a = starts_sep,
    uf = f.history['xgboost']['UpperCI'],
    lf = f.history['xgboost']['LowerCI'],
    obs = f.y,
    m = 12,
)
[22]:
3.919628517087574

It is a small improvement, but still an improvement!

[23]:
print(
    'The intervals are on average {:.2f} units away from the point predictions.'.format(
        np.mean(np.percentile(mat, q=95, axis = 0))
    )

)
print(
    'The interval contains {:.2%} of the actual values'.format(
        np.sum(
            [
                1 if a <= uf and a >= lf else 0
                for a, uf, lf in zip(
                    starts_sep,
                    f.history['xgboost']['UpperCI'],
                    f.history['xgboost']['LowerCI']
                )
            ]
        ) / len(starts_sep)
    )
)
The intervals are on average 27.36 units away from the point predictions.
The interval contains 87.50% of the actual values

Other Backtest Uses

  • We can also use the backtest results to report average error metrics over 20 out-of-sample sets.

[24]:
backtest_metrics(backtest_results,mets=['rmse','mae','bias'])[['Average']]
[24]:
Average
Model Metric
xgboost rmse 14.083597
mae 12.144780
bias 239.430706
[25]:
# actual rmse on out-of-sample data
metrics.rmse(starts_sep,f.history['xgboost']['Forecast'])
[25]:
16.208569138706174
[ ]: