Combo Modeling
An introduction to the native ensemble/combo model in scalecast.
[1]:
import pandas as pd
import pandas_datareader as pdr
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
from dateutil.relativedelta import relativedelta
from scalecast.Forecaster import Forecaster
from scalecast.SeriesTransformer import SeriesTransformer
from scalecast.Pipeline import Transformer, Reverter
from scalecast import GridGenerator
Download data from FRED (https://fred.stlouisfed.org/series/HOUSTNSA). This data is interesting due to its strong seasonality and irregular cycles. It measures monthly housing starts in the USA since 1959. Predicting this metric with some series that measures demand for houses could be an interesting extension to be able to explain housing prices. It is a common example series that scalecast uses.
[2]:
GridGenerator.get_example_grids(overwrite=True)
df = pdr.get_data_fred('HOUSTNSA',start='1900-01-01',end='2022-12-31')
f = Forecaster(
y=df['HOUSTNSA'],
current_dates=df.index,
future_dates = 24,
test_length = .1,
)
f
[2]:
Forecaster(
DateStartActuals=1959-01-01T00:00:00.000000000
DateEndActuals=2022-12-01T00:00:00.000000000
Freq=MS
N_actuals=768
ForecastLength=24
Xvars=[]
TestLength=76
ValidationMetric=rmse
ForecastsEvaluated=[]
CILevel=None
CurrentEstimator=mlr
GridsFile=Grids
)
Preprocess Data
Difference, seasonal difference, and scale the data.
Create the object that will revert these transformations.
[3]:
# create transformations to model stationary data
transformer = Transformer(
transformers = [
('DiffTransform',1),
('DiffTransform',12),
('MinMaxTransform',),
]
)
reverter = Reverter(
reverters = [
('MinMaxRevert',),
('DiffRevert',12),
('DiffRevert',1)
],
base_transformer = transformer,
)
reverter
[3]:
Reverter(
reverters = [
('MinMaxRevert',),
('DiffRevert', 12),
('DiffRevert', 1)
],
base_transformer = Transformer(
transformers = [
('DiffTransform', 1),
('DiffTransform', 12),
('MinMaxTransform',)
]
)
)
[4]:
# transform the series by calling the Transformer.fit_transform() method
f = transformer.fit_transform(f)
[5]:
# plot the results
f.plot();
[6]:
# add regressors
f.add_ar_terms(24)
f
[6]:
Forecaster(
DateStartActuals=1960-02-01T00:00:00.000000000
DateEndActuals=2022-12-01T00:00:00.000000000
Freq=MS
N_actuals=755
ForecastLength=24
Xvars=['AR1', 'AR2', 'AR3', 'AR4', 'AR5', 'AR6', 'AR7', 'AR8', 'AR9', 'AR10', 'AR11', 'AR12', 'AR13', 'AR14', 'AR15', 'AR16', 'AR17', 'AR18', 'AR19', 'AR20', 'AR21', 'AR22', 'AR23', 'AR24']
TestLength=76
ValidationMetric=rmse
ForecastsEvaluated=[]
CILevel=None
CurrentEstimator=mlr
GridsFile=Grids
)
Evaluate Forecasting models
[7]:
# evaluate some models
f.tune_test_forecast(
[
'elasticnet',
'lasso',
'ridge',
'gbt',
'lightgbm',
'xgboost',
],
dynamic_testing = 24,
limit_grid_size = .2,
)
Finished loading model, total used 150 iterations
Finished loading model, total used 150 iterations
Finished loading model, total used 150 iterations
Combine Evaluated Models
Below, we see several combination options that scalecast offers. These are just examples and not meant to try to find the absolute best combination for the data.
[8]:
f.set_estimator('combo')
# simple average of all models
f.manual_forecast(call_me = 'avg_all')
# weighted average of all models where the weights are determined from validation (not test) performance
f.manual_forecast(
how = 'weighted',
determine_best_by = 'ValidationMetricValue',
call_me = 'weighted_avg_all',
)
# simple average of a select set of models
f.manual_forecast(models = ['xgboost','gbt','lightgbm'],call_me = 'avg_trees')
# weighted average of a select set of models where the weights are determined from validation (not test) performance
f.manual_forecast(models = ['elasticnet','lasso','ridge'],call_me = 'avg_lms')
# weighted average of a select set of models where the weights are manually passed
# weights do not have to add to 1 and they will be rebalanced to do so
f.manual_forecast(
how = 'weighted',
models = ['xgboost','elasticnet','lightgbm'],
weights = (3,2,1),
determine_best_by = None,
call_me = 'weighted_avg_manual',
)
# splice (not many other libraries do this) - splice the future point forecasts of two or more models together
f.manual_forecast(
how='splice',
models = ['elasticnet','lightgbm'],
splice_points = ['2023-01-01'],
call_me = 'splice',
)
[9]:
# plot the forecasts at the series' transformed level
f.plot();
[10]:
# revert the transformation
f = reverter.fit_transform(f)
[11]:
# plot the forecasts at the series' original level
f.plot();
[12]:
# view performance on test set
f.plot_test_set(models = 'top_3',order_by = 'TestSetRMSE',include_train = False);
[ ]: