Multivariate - Beyond the Basics
This notebook shows multivariate forecasting procedures with scalecast. It requires 0.18.2
. It uses the Avocados dataset.
We will treat this like a demand forecasting problem. We want to know how many total Avocados will be in demand in the next quarter. But since we know demand and price are intricately related, we will use the historical Avocado prices as a predictor of demand.
[1]:
import pandas as pd
import numpy as np
from scalecast.Forecaster import Forecaster
from scalecast.MVForecaster import MVForecaster
from scalecast.Pipeline import MVPipeline
from scalecast.util import (
find_optimal_transformation,
find_optimal_lag_order,
break_mv_forecaster,
backtest_metrics,
backtest_for_resid_matrix,
get_backtest_resid_matrix,
overwrite_forecast_intervals,
)
from scalecast import GridGenerator
Read in hyperparameter grids for optimizing models.
[2]:
GridGenerator.get_example_grids()
GridGenerator.get_mv_grids()
[3]:
pd.options.display.max_colwidth = 100
pd.options.display.float_format = '{:,.2f}'.format
[4]:
# arguments to pass to every Forecaster/MVForecaster object we create
Forecaster_kws = dict(
test_length = 13,
validation_length = 13,
metrics = ['rmse','r2'],
)
[5]:
# model summary columns to export everytime we check a model's performance
export_cols = ['ModelNickname','HyperParams','TestSetR2','TestSetRMSE']
[6]:
# read data
data = pd.read_csv('avocado.csv',parse_dates=['Date']).sort_values(['Date'])
# sort appropriately (not doing this could cause issues)
data = data.sort_values(['region','type','Date'])
data.head()
[6]:
Unnamed: 0 | Date | AveragePrice | Total Volume | 4046 | 4225 | 4770 | Total Bags | Small Bags | Large Bags | XLarge Bags | type | year | region | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
51 | 51 | 2015-01-04 | 1.22 | 40,873.28 | 2,819.50 | 28,287.42 | 49.90 | 9,716.46 | 9,186.93 | 529.53 | 0.00 | conventional | 2015 | Albany |
50 | 50 | 2015-01-11 | 1.24 | 41,195.08 | 1,002.85 | 31,640.34 | 127.12 | 8,424.77 | 8,036.04 | 388.73 | 0.00 | conventional | 2015 | Albany |
49 | 49 | 2015-01-18 | 1.17 | 44,511.28 | 914.14 | 31,540.32 | 135.77 | 11,921.05 | 11,651.09 | 269.96 | 0.00 | conventional | 2015 | Albany |
48 | 48 | 2015-01-25 | 1.06 | 45,147.50 | 941.38 | 33,196.16 | 164.14 | 10,845.82 | 10,103.35 | 742.47 | 0.00 | conventional | 2015 | Albany |
47 | 47 | 2015-02-01 | 0.99 | 70,873.60 | 1,353.90 | 60,017.20 | 179.32 | 9,323.18 | 9,170.82 | 152.36 | 0.00 | conventional | 2015 | Albany |
[7]:
# demand
vol = data.groupby('Date')['Total Volume'].sum()
[8]:
# price
price = data.groupby('Date')['AveragePrice'].sum()
[9]:
# one Forecaster object needed for each series we want to predict multivariately
# volume
fvol = Forecaster(
y = vol,
current_dates = vol.index,
future_dates = 13,
**Forecaster_kws,
)
[10]:
# price
fprice = Forecaster(
y = price,
current_dates = price.index,
future_dates = 13,
**Forecaster_kws,
)
[11]:
# combine Forecaster objects into one MVForecaster object
# all dates will line up and all models will recursively predict values for all series
mvf = MVForecaster(
fvol,
fprice,
names=['volume','price'],
**Forecaster_kws,
)
[12]:
mvf
[12]:
MVForecaster(
DateStartActuals=2015-01-04T00:00:00.000000000
DateEndActuals=2018-03-25T00:00:00.000000000
Freq=W-SUN
N_actuals=169
N_series=2
SeriesNames=['volume', 'price']
ForecastLength=13
Xvars=[]
TestLength=13
ValidationLength=13
ValidationMetric=rmse
ForecastsEvaluated=[]
CILevel=None
CurrentEstimator=mlr
OptimizeOn=mean
GridsFile=MVGrids
)
1. Transformations
To make the forecasting task easier, we can transform the data in each Forecaster object before feeding them to the MVForecaster object. The below function will search through many transformations, using out-of-sample testing to score each one. We pass four possible seasonalities to the function (monthly, quarterly, bi-annually, annually) and the results are several seasonal adjustments get selected.
[13]:
transformers = []
reverters = []
for name, f in zip(('volume','price'),(fvol,fprice)):
print(f'\nFinding best transformation for the {name} series.')
transformer, reverter = find_optimal_transformation(
f,
m = [
4,
13,
26,
52,
],
test_length = 13,
num_test_sets = 2,
space_between_sets = 13,
return_train_only = True,
verbose = True,
)
transformers.append(transformer)
reverters.append(reverter)
Finding best transformation for the volume series.
Using mlr model to find the best transformation set on 2 test sets, each 13 in length.
All transformation tries will be evaluated with 4 lags.
Last transformer tried:
[]
Score (rmse): 19481972.64636622
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'loess': True})]
Score (rmse): 22085767.144446835
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1})]
Score (rmse): 19630858.294620857
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2})]
Score (rmse): 22320325.279892858
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 4, 'model': 'add'})]
Score (rmse): 18763298.437913556
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'})]
Score (rmse): 18061445.02080934
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 26, 'model': 'add'})]
Score (rmse): 18351627.623842016
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'})]
Score (rmse): 15388459.611609437
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('Transform', <function find_optimal_transformation.<locals>.boxcox_tr at 0x0000022788613D30>, {'lmbda': -0.5})]
Score (rmse): 15776741.170206662
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('Transform', <function find_optimal_transformation.<locals>.boxcox_tr at 0x0000022788613D30>, {'lmbda': 0})]
Score (rmse): 15640424.466095788
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('Transform', <function find_optimal_transformation.<locals>.boxcox_tr at 0x0000022788613D30>, {'lmbda': 0.5})]
Score (rmse): 15512957.889126703
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
Score (rmse): 15929820.9564328
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 4)]
Score (rmse): 14324958.982509937
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 4), ('DiffTransform', 13)]
Score (rmse): 18135344.27502767
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 4), ('DiffTransform', 26)]
Score (rmse): 21861866.629635938
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 4), ('DiffTransform', 52)]
Score (rmse): 20808840.990807127
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 4), ('ScaleTransform',)]
Score (rmse): 14324958.982509933
--------------------------------------------------
Last transformer tried:
[('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 4), ('MinMaxTransform',)]
Score (rmse): 14324958.98250994
--------------------------------------------------
Final Selection:
[('DeseasonTransform', {'m': 4, 'model': 'add', 'train_only': True}), ('DeseasonTransform', {'m': 13, 'model': 'add', 'train_only': True}), ('DeseasonTransform', {'m': 52, 'model': 'add', 'train_only': True}), ('DiffTransform', 4), ('ScaleTransform', {'train_only': True})]
Finding best transformation for the price series.
Using mlr model to find the best transformation set on 2 test sets, each 13 in length.
All transformation tries will be evaluated with 4 lags.
Last transformer tried:
[]
Score (rmse): 22.25551611050048
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'loess': True})]
Score (rmse): 25.65997061765327
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1})]
Score (rmse): 22.148856499520484
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2})]
Score (rmse): 32.75467733406476
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 4, 'model': 'add'})]
Score (rmse): 21.72760152488739
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'})]
Score (rmse): 20.055641074156764
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 26, 'model': 'add'})]
Score (rmse): 22.020127438895862
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'})]
Score (rmse): 14.604251739058533
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 1)]
Score (rmse): 18.183007629056675
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 4)]
Score (rmse): 15.96916031713575
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 13)]
Score (rmse): 18.4021660531495
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 26)]
Score (rmse): 25.298723431620186
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('DiffTransform', 52)]
Score (rmse): 19.452999810002588
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('ScaleTransform',)]
Score (rmse): 14.604251739058554
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1}), ('DeseasonTransform', {'m': 4, 'model': 'add'}), ('DeseasonTransform', {'m': 13, 'model': 'add'}), ('DeseasonTransform', {'m': 52, 'model': 'add'}), ('MinMaxTransform',)]
Score (rmse): 14.604251739058554
--------------------------------------------------
Final Selection:
[('DetrendTransform', {'poly_order': 1, 'train_only': True}), ('DeseasonTransform', {'m': 4, 'model': 'add', 'train_only': True}), ('DeseasonTransform', {'m': 13, 'model': 'add', 'train_only': True}), ('DeseasonTransform', {'m': 52, 'model': 'add', 'train_only': True})]
Plot the series after a transformation has been taken.
[14]:
fvol1 = transformers[0].fit_transform(fvol)
fvol1.plot();

[15]:
fprice1 = transformers[1].fit_transform(fprice)
fprice1.plot();

Now, combine into an MVForecaster
object.
[16]:
mvf1 = MVForecaster(
fvol1,
fprice1,
names = ['volume','price'],
**Forecaster_kws,
)
2. Optimal Lag Selection
Method 1: Univariate out-of-sample testing
The functions below choose the best lags based on what minimizes RMSE on an out-of-sample validation set.
[17]:
fvol1.auto_Xvar_select(try_trend=False,try_seasonalities=False)
fvol1.get_regressor_names()
[17]:
['AR1',
'AR2',
'AR3',
'AR4',
'AR5',
'AR6',
'AR7',
'AR8',
'AR9',
'AR10',
'AR11',
'AR12',
'AR13']
[18]:
fprice1.auto_Xvar_select(try_trend=False,try_seasonalities=False)
fprice1.get_regressor_names()
[18]:
['AR1', 'AR2', 'AR3', 'AR4', 'AR5', 'AR6', 'AR7', 'AR8', 'AR9', 'AR10', 'AR11']
Method 2: Information Criteria Search with VAR
[19]:
lag_order_res = find_optimal_lag_order(mvf1,train_only=True)
lag_orders = pd.DataFrame({
'aic':[lag_order_res.aic],
'bic':[lag_order_res.bic],
})
lag_orders
[19]:
aic | bic | |
---|---|---|
0 | 12 | 4 |
Method 3: Multivariate Cross Validation with MLR
[20]:
lags = [
1,
2,
3,
4,
9,
10,
11,
12,
13,
{'volume':13,'price':9},
[4,9,12,13],
]
[21]:
grid = dict(
lags = lags
)
[22]:
mvf1.set_optimize_on('volume')
[23]:
mvf1.ingest_grid(grid)
mvf1.cross_validate(k=3,test_length=13,verbose = True,dynamic_tuning=True)
Num hyperparams to try for the mlr model: 11.
Fold 0: Train size: 139 (2015-02-01 00:00:00 - 2017-09-24 00:00:00). Test Size: 13 (2017-10-01 00:00:00 - 2017-12-24 00:00:00).
Fold 1: Train size: 126 (2015-02-01 00:00:00 - 2017-06-25 00:00:00). Test Size: 13 (2017-07-02 00:00:00 - 2017-09-24 00:00:00).
Fold 2: Train size: 113 (2015-02-01 00:00:00 - 2017-03-26 00:00:00). Test Size: 13 (2017-04-02 00:00:00 - 2017-06-25 00:00:00).
Chosen paramaters: {'lags': 10}.
3. Model Optimization with Cross Validation
[24]:
def forecaster(mvf):
mvf.tune_test_forecast(
['lasso','ridge','xgboost','lightgbm'],
cross_validate = True,
k = 3,
test_length = 13,
dynamic_tuning=True,
limit_grid_size=.2,
min_grid_size=4,
)
forecaster(mvf1)
[25]:
mvf1.plot(series='volume');

[26]:
mvf1.export('model_summaries',series='volume')[['Series'] + export_cols + ['Lags']].style.set_properties(height = 5)
[26]:
Series | ModelNickname | HyperParams | TestSetR2 | TestSetRMSE | Lags | |
---|---|---|---|---|---|---|
0 | volume | lasso | {'alpha': 0.02} | -0.089158 | 1.399202 | 10 |
1 | volume | ridge | {'alpha': 0.04} | -0.050000 | 1.373819 | 10 |
2 | volume | xgboost | {'n_estimators': 250, 'scale_pos_weight': 5, 'learning_rate': 0.2, 'gamma': 3, 'subsample': 0.8} | -0.017800 | 1.352589 | 10 |
3 | volume | lightgbm | {'n_estimators': 250, 'boosting_type': 'goss', 'max_depth': 2, 'learning_rate': 0.01} | -0.183962 | 1.458827 | [4, 9, 12, 13] |
[27]:
mvf1
[27]:
MVForecaster(
DateStartActuals=2015-02-01T00:00:00.000000000
DateEndActuals=2018-03-25T00:00:00.000000000
Freq=W-SUN
N_actuals=165
N_series=2
SeriesNames=['volume', 'price']
ForecastLength=13
Xvars=[]
TestLength=13
ValidationLength=13
ValidationMetric=rmse
ForecastsEvaluated=['lasso', 'ridge', 'xgboost', 'lightgbm']
CILevel=None
CurrentEstimator=lightgbm
OptimizeOn=volume
GridsFile=MVGrids
)
[28]:
fvol1, fprice1 = break_mv_forecaster(mvf1)
[29]:
reverter = reverters[0]
fvol1 = reverter.fit_transform(fvol1)
[30]:
fvol1.plot();

[31]:
fvol1.export('model_summaries')[export_cols].style.set_properties(height = 5)
[31]:
ModelNickname | HyperParams | TestSetR2 | TestSetRMSE | |
---|---|---|---|---|
0 | lasso | {'alpha': 0.02} | 0.388886 | 13185802.986461 |
1 | ridge | {'alpha': 0.04} | 0.249991 | 14607592.004013 |
2 | xgboost | {'n_estimators': 250, 'scale_pos_weight': 5, 'learning_rate': 0.2, 'gamma': 3, 'subsample': 0.8} | 0.455020 | 12451896.764192 |
3 | lightgbm | {'n_estimators': 250, 'boosting_type': 'goss', 'max_depth': 2, 'learning_rate': 0.01} | 0.281993 | 14292550.306560 |
4. Model Stacking
[32]:
def model_stack(mvf,train_only=False):
mvf.add_signals(['lasso','ridge','lightgbm','xgboost'],train_only=train_only)
mvf.set_estimator('catboost')
mvf.manual_forecast(
lags = 13,
verbose = False,
)
model_stack(mvf1,train_only=True)
[33]:
mvf1.plot(series='volume');

[34]:
mvf1.export('model_summaries',series='volume')[['Series'] + export_cols + ['Lags']].style.set_properties(height = 5)
[34]:
Series | ModelNickname | HyperParams | TestSetR2 | TestSetRMSE | Lags | |
---|---|---|---|---|---|---|
0 | volume | lasso | {'alpha': 0.02} | -0.089158 | 1.399202 | 10 |
1 | volume | ridge | {'alpha': 0.04} | -0.050000 | 1.373819 | 10 |
2 | volume | xgboost | {'n_estimators': 250, 'scale_pos_weight': 5, 'learning_rate': 0.2, 'gamma': 3, 'subsample': 0.8} | -0.017800 | 1.352589 | 10 |
3 | volume | lightgbm | {'n_estimators': 250, 'boosting_type': 'goss', 'max_depth': 2, 'learning_rate': 0.01} | -0.183962 | 1.458827 | [4, 9, 12, 13] |
4 | volume | catboost | {'verbose': False} | 0.069340 | 1.293392 | 13 |
[35]:
fvol1, fprice1 = break_mv_forecaster(mvf1)
[36]:
fvol1 = reverter.fit_transform(fvol1)
[37]:
fvol1.export('model_summaries',determine_best_by='TestSetRMSE')[export_cols].style.set_properties(height = 5)
[37]:
ModelNickname | HyperParams | TestSetR2 | TestSetRMSE | |
---|---|---|---|---|
0 | xgboost | {'n_estimators': 250, 'scale_pos_weight': 5, 'learning_rate': 0.2, 'gamma': 3, 'subsample': 0.8} | 0.455020 | 12451896.764192 |
1 | catboost | {'verbose': False} | 0.447398 | 12538678.460293 |
2 | lasso | {'alpha': 0.02} | 0.388886 | 13185802.986461 |
3 | lightgbm | {'n_estimators': 250, 'boosting_type': 'goss', 'max_depth': 2, 'learning_rate': 0.01} | 0.281993 | 14292550.306560 |
4 | ridge | {'alpha': 0.04} | 0.249991 | 14607592.004013 |
[38]:
fvol1.plot_test_set(order_by='TestSetRMSE');

[39]:
fvol1.plot(order_by='TestSetRMSE');

5. Multivariate Pipelines
[40]:
def mvforecaster(mvf,train_only=False):
forecaster(mvf)
model_stack(mvf,train_only=train_only)
[41]:
pipeline = MVPipeline(
steps = [
('Transform',transformers),
('Forecast',mvforecaster),
('Revert',reverters),
],
**Forecaster_kws,
)
[42]:
fvol1, fprice1 = pipeline.fit_predict(fvol,fprice,train_only=True)
[43]:
fvol1.plot_test_set(order_by='TestSetRMSE');

[44]:
fvol1.plot(order_by='TestSetRMSE');

[45]:
fvol1.export('model_summaries',determine_best_by='TestSetRMSE')[export_cols].style.set_properties(height = 5)
[45]:
ModelNickname | HyperParams | TestSetR2 | TestSetRMSE | |
---|---|---|---|---|
0 | lightgbm | {'n_estimators': 150, 'boosting_type': 'dart', 'max_depth': 1, 'learning_rate': 0.1} | 0.511729 | 11786259.617968 |
1 | lasso | {'alpha': 0.53} | 0.440399 | 12617822.749627 |
2 | ridge | {'alpha': 1.0} | 0.433704 | 12693080.386907 |
3 | catboost | {'verbose': False} | 0.384513 | 13232893.075803 |
4 | xgboost | {'n_estimators': 250, 'scale_pos_weight': 10, 'learning_rate': 0.2, 'gamma': 0, 'subsample': 0.8} | 0.381760 | 13262451.219228 |
6. Backtesting
[46]:
backtest_results = pipeline.backtest(
fvol,
fprice,
n_iter = 4,
fcst_length = 13,
test_length = 0,
jump_back = 13,
)
[47]:
backtest_metrics(
backtest_results[:1], # volume only
mets=['rmse','mae','r2','bias'],
#names=['volume','price'],
)
[47]:
Iter0 | Iter1 | Iter2 | Iter3 | Average | ||
---|---|---|---|---|---|---|
Model | Metric | |||||
lasso | rmse | 12,676,035.50 | 19,334,820.60 | 12,408,199.56 | 6,865,555.60 | 12,821,152.81 |
mae | 10,622,259.32 | 16,818,913.78 | 10,196,006.95 | 4,890,701.26 | 10,631,970.33 | |
r2 | 0.44 | -2.98 | -0.17 | 0.34 | -0.59 | |
bias | -103,321,272.45 | -216,135,991.11 | 106,453,214.29 | 7,242,071.64 | -51,440,494.41 | |
ridge | rmse | 13,245,785.08 | 19,757,231.61 | 12,581,587.51 | 8,092,421.06 | 13,419,256.32 |
mae | 10,864,770.69 | 17,175,778.82 | 10,362,478.27 | 6,239,668.77 | 11,160,674.14 | |
r2 | 0.38 | -3.16 | -0.20 | 0.09 | -0.72 | |
bias | -119,334,285.17 | -221,927,247.43 | 109,823,042.50 | 55,636,823.14 | -43,950,416.74 | |
xgboost | rmse | 19,261,511.73 | 15,233,136.06 | 15,781,395.08 | 6,583,385.75 | 14,214,857.16 |
mae | 15,981,374.97 | 13,767,893.53 | 13,479,663.34 | 5,216,355.57 | 12,111,321.85 | |
r2 | -0.30 | -1.47 | -0.89 | 0.40 | -0.57 | |
bias | -103,418,980.41 | -151,259,604.70 | 155,235,475.90 | -16,515,829.46 | -28,989,734.67 | |
lightgbm | rmse | 11,239,291.40 | 17,262,898.64 | 14,840,433.88 | 7,289,722.74 | 12,658,086.67 |
mae | 9,087,987.10 | 15,146,134.37 | 12,373,711.83 | 5,735,222.10 | 10,585,763.85 | |
r2 | 0.56 | -2.17 | -0.67 | 0.26 | -0.51 | |
bias | -86,731,196.07 | -189,464,392.96 | 140,926,488.55 | 43,025,640.10 | -23,060,865.10 | |
catboost | rmse | 17,455,804.71 | 14,955,271.67 | 16,116,336.26 | 6,315,491.61 | 13,710,726.06 |
mae | 14,805,029.65 | 13,567,739.76 | 13,603,593.10 | 5,036,532.77 | 11,753,223.82 | |
r2 | -0.07 | -1.38 | -0.97 | 0.45 | -0.50 | |
bias | -108,026,362.31 | -146,926,722.77 | 162,948,970.47 | 24,860,226.70 | -16,785,971.98 |
7. Dynamic Intervals
[48]:
backtest_results = backtest_for_resid_matrix(
fvol,
fprice,
pipeline = pipeline,
alpha = 0.1, # 90% intervals
)
[49]:
backtest_resid_matrix = get_backtest_resid_matrix(backtest_results)
[50]:
overwrite_forecast_intervals(
fvol1,
fprice1,
backtest_resid_matrix=backtest_resid_matrix,
alpha=0.1,
)
[51]:
fvol1.plot(models='top_1',order_by='TestSetRMSE',ci=True);

8. LSTM Modeling
[52]:
fvol1 = transformers[0].fit_transform(fvol1)
fprice1 = transformers[1].fit_transform(fprice1)
[53]:
fvol1.add_ar_terms(13)
[54]:
fvol1.set_estimator('rnn')
fvol1.tune()
fvol1.auto_forecast(call_me='lstm_uv')
[55]:
fvol1.add_series(fprice1.y,called='price')
fvol1.add_lagged_terms('price',lags=13,drop=True)
fvol1
[55]:
Forecaster(
DateStartActuals=2015-02-01T00:00:00.000000000
DateEndActuals=2018-03-25T00:00:00.000000000
Freq=W-SUN
N_actuals=165
ForecastLength=13
Xvars=['AR1', 'AR2', 'AR3', 'AR4', 'AR5', 'AR6', 'AR7', 'AR8', 'AR9', 'AR10', 'AR11', 'AR12', 'AR13', 'pricelag_1', 'pricelag_2', 'pricelag_3', 'pricelag_4', 'pricelag_5', 'pricelag_6', 'pricelag_7', 'pricelag_8', 'pricelag_9', 'pricelag_10', 'pricelag_11', 'pricelag_12', 'pricelag_13']
TestLength=13
ValidationMetric=rmse
ForecastsEvaluated=['lasso', 'ridge', 'xgboost', 'lightgbm', 'catboost', 'lstm_uv']
CILevel=None
CurrentEstimator=rnn
GridsFile=Grids
)
[56]:
fvol1.tune()
fvol1.auto_forecast(call_me='lstm_mv')
[57]:
fvol1.plot_test_set(models=['lstm_uv','lstm_mv']);

[58]:
fvol1.plot(models=['lstm_uv','lstm_mv']);

[59]:
fvol1 = reverters[0].fit_transform(fvol1,exclude_models=['lightgbm','lasso','ridge','xgboost','catboost'])
fprice1 = reverters[1].fit_transform(fprice1,exclude_models=['lightgbm','lasso','ridge','xgboost','catboost'])
[60]:
ms = fvol1.export('model_summaries')
ms = ms[export_cols]
ms.style.set_properties(height = 5)
[60]:
ModelNickname | HyperParams | TestSetR2 | TestSetRMSE | |
---|---|---|---|---|
0 | lasso | {'alpha': 0.53} | 0.440399 | 12617822.749627 |
1 | ridge | {'alpha': 1.0} | 0.433704 | 12693080.386907 |
2 | xgboost | {'n_estimators': 250, 'scale_pos_weight': 10, 'learning_rate': 0.2, 'gamma': 0, 'subsample': 0.8} | 0.381760 | 13262451.219228 |
3 | lightgbm | {'n_estimators': 150, 'boosting_type': 'dart', 'max_depth': 1, 'learning_rate': 0.1} | 0.511729 | 11786259.617968 |
4 | catboost | {'verbose': False} | 0.384513 | 13232893.075803 |
5 | lstm_uv | {'layers_struct': [('LSTM', {'units': 50, 'activation': 'tanh', 'dropout': 0.2, 'return_sequences': True}), ('LSTM', {'units': 50, 'activation': 'tanh', 'dropout': 0.2, 'return_sequences': True}), ('LSTM', {'units': 50, 'activation': 'tanh', 'dropout': 0.2, 'return_sequences': False})], 'epochs': 50, 'verbose': 0} | 0.449703 | 12512491.086281 |
6 | lstm_mv | {'layers_struct': [('LSTM', {'units': 50, 'activation': 'tanh', 'return_sequences': True}), ('LSTM', {'units': 50, 'activation': 'tanh', 'return_sequences': True}), ('LSTM', {'units': 50, 'activation': 'tanh', 'return_sequences': False})], 'epochs': 50, 'verbose': 0} | 0.514143 | 11757086.227344 |
[61]:
fvol1.plot_test_set(order_by = 'TestSetRMSE');

[62]:
fvol1.plot(order_by = 'TestSetRMSE');

9. Benchmarking against Naive Model
[63]:
fvol1 = transformers[0].fit_transform(fvol1)
fvol1.set_estimator('naive')
fvol1.manual_forecast()
fvol1 = reverters[0].fit_transform(fvol1,exclude_models=['lightgbm','lasso','ridge','xgboost','catboost','lstm_uv','lstm_mv'])
[67]:
ms = fvol1.export('model_summaries',determine_best_by='TestSetRMSE')
ms = ms[export_cols]
ms.style.set_properties(height = 5)
[67]:
ModelNickname | HyperParams | TestSetR2 | TestSetRMSE | |
---|---|---|---|---|
0 | lstm_mv | {'layers_struct': [('LSTM', {'units': 50, 'activation': 'tanh', 'return_sequences': True}), ('LSTM', {'units': 50, 'activation': 'tanh', 'return_sequences': True}), ('LSTM', {'units': 50, 'activation': 'tanh', 'return_sequences': False})], 'epochs': 50, 'verbose': 0} | 0.514143 | 11757086.227344 |
1 | lightgbm | {'n_estimators': 150, 'boosting_type': 'dart', 'max_depth': 1, 'learning_rate': 0.1} | 0.511729 | 11786259.617968 |
2 | lstm_uv | {'layers_struct': [('LSTM', {'units': 50, 'activation': 'tanh', 'dropout': 0.2, 'return_sequences': True}), ('LSTM', {'units': 50, 'activation': 'tanh', 'dropout': 0.2, 'return_sequences': True}), ('LSTM', {'units': 50, 'activation': 'tanh', 'dropout': 0.2, 'return_sequences': False})], 'epochs': 50, 'verbose': 0} | 0.449703 | 12512491.086281 |
3 | lasso | {'alpha': 0.53} | 0.440399 | 12617822.749627 |
4 | ridge | {'alpha': 1.0} | 0.433704 | 12693080.386907 |
5 | catboost | {'verbose': False} | 0.384513 | 13232893.075803 |
6 | xgboost | {'n_estimators': 250, 'scale_pos_weight': 10, 'learning_rate': 0.2, 'gamma': 0, 'subsample': 0.8} | 0.381760 | 13262451.219228 |
7 | naive | {} | -0.188041 | 18384892.054096 |
[65]:
fvol1.plot_test_set(order_by = 'TestSetRMSE');

[66]:
fvol1.plot(order_by = 'TestSetRMSE');
