Theta
Read about the theta model
See the darts implementation
See the statsmodels implementation
Download data from GitHub
Install darts:
pip install darts
See the blog post
Scalecast ports the model from darts, which is supposed to be more accurate and is also easier to maintain.
[1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from darts.utils.utils import SeasonalityMode, TrendMode, ModelMode
from scalecast.Forecaster import Forecaster
from scalecast.util import metrics
from scalecast import GridGenerator
[2]:
train = pd.read_csv('Hourly-train.csv',index_col=0)
test = pd.read_csv('Hourly-test.csv',index_col=0)
y = train.loc['H7'].to_list()
current_dates = pd.date_range(start='2015-01-07 12:00',freq='H',periods=len(y)).to_list()
y_test = test.loc['H7'].to_list()
f = Forecaster(
y=y,
current_dates=current_dates,
metrics = ['smape','r2'],
test_length = .25,
future_dates = len(y_test),
cis = True,
)
f
[2]:
Forecaster(
DateStartActuals=2015-01-18T08:00:00.000000000
DateEndActuals=2015-02-16T11:00:00.000000000
Freq=H
N_actuals=700
ForecastLength=48
Xvars=[]
TestLength=240
ValidationMetric=smape
ForecastsEvaluated=[]
CILevel=0.95
CurrentEstimator=mlr
GridsFile=Grids
)
[3]:
f.plot()
plt.show()

Prepare forecast
Download theta’s validation grid
[6]:
GridGenerator.get_grids('theta',out_name='Grids.py')
f.ingest_grid('theta')
Call the forecast
tune hyperparemters with 3-fold time series cross validation
[8]:
f.set_estimator('theta')
f.cross_validate(k=3,verbose=True)
f.auto_forecast()
Num hyperparams to try for the theta model: 48.
Fold 0: Train size: 345 (2015-01-18 08:00:00 - 2015-02-01 16:00:00). Test Size: 115 (2015-02-01 17:00:00 - 2015-02-06 11:00:00).
Fold 1: Train size: 230 (2015-01-18 08:00:00 - 2015-01-27 21:00:00). Test Size: 115 (2015-01-27 22:00:00 - 2015-02-01 16:00:00).
Fold 2: Train size: 115 (2015-01-18 08:00:00 - 2015-01-23 02:00:00). Test Size: 115 (2015-01-23 03:00:00 - 2015-01-27 21:00:00).
Chosen paramaters: {'theta': 0.5, 'model_mode': <ModelMode.ADDITIVE: 'additive'>, 'season_mode': <SeasonalityMode.MULTIPLICATIVE: 'multiplicative'>, 'trend_mode': <TrendMode.EXPONENTIAL: 'exponential'>}.
Visualize test results
[9]:
f.plot_test_set(ci=True)
plt.show()

Visualize forecast results
[10]:
f.plot(ci=True)
plt.show()

See in-sample and out-of-sample accuracy/error metrics
[12]:
results = f.export('model_summaries')
[13]:
results[
[
'TestSetSMAPE',
'InSampleSMAPE',
'TestSetR2',
'InSampleR2',
'ValidationMetric',
'ValidationMetricValue',
'TestSetLength'
]
]
[13]:
TestSetSMAPE | InSampleSMAPE | TestSetR2 | InSampleR2 | ValidationMetric | ValidationMetricValue | TestSetLength | |
---|---|---|---|---|---|---|---|
0 | 0.058274 | 0.014082 | 0.786943 | 0.984652 | smape | 0.064113 | 240 |
The validation metric displayed above is the average SMAPE across the three cross-validation folds.
[14]:
validation_grid = f.export_validation_grid('theta')
validation_grid.head()
[14]:
theta | model_mode | season_mode | trend_mode | Fold0Metric | Fold1Metric | Fold2Metric | AverageMetric | MetricEvaluated | |
---|---|---|---|---|---|---|---|---|---|
0 | 0.5 | ModelMode.ADDITIVE | SeasonalityMode.MULTIPLICATIVE | TrendMode.EXPONENTIAL | 0.056320 | 0.067335 | 0.068684 | 0.064113 | smape |
1 | 0.5 | ModelMode.ADDITIVE | SeasonalityMode.MULTIPLICATIVE | TrendMode.LINEAR | 0.056510 | 0.072710 | 0.084188 | 0.071136 | smape |
2 | 0.5 | ModelMode.ADDITIVE | SeasonalityMode.ADDITIVE | TrendMode.EXPONENTIAL | 0.055679 | 0.075179 | 0.073782 | 0.068213 | smape |
3 | 0.5 | ModelMode.ADDITIVE | SeasonalityMode.ADDITIVE | TrendMode.LINEAR | 0.056044 | 0.081705 | 0.089734 | 0.075827 | smape |
4 | 0.5 | ModelMode.MULTIPLICATIVE | SeasonalityMode.MULTIPLICATIVE | TrendMode.EXPONENTIAL | 0.056569 | 0.071406 | 0.080493 | 0.069489 | smape |
Test the forecast against out-of-sample data
this is data the
Forecaster
object has never seen
[15]:
fcst = f.export('lvl_fcsts')
fcst.head()
[15]:
DATE | theta | |
---|---|---|
0 | 2015-02-16 12:00:00 | 49921.533332 |
1 | 2015-02-16 13:00:00 | 49135.400143 |
2 | 2015-02-16 14:00:00 | 47126.062514 |
3 | 2015-02-16 15:00:00 | 43417.987575 |
4 | 2015-02-16 16:00:00 | 39867.287257 |
[16]:
fig, ax = plt.subplots(figsize=(12,6))
f.plot(ax=ax,ci=True)
sns.lineplot(
x = f.future_dates,
y = y_test,
ax = ax,
label = 'held out actuals',
color = 'green',
)
plt.show()

[17]:
smape = metrics.smape(y_test,fcst['theta'])
smape
[17]:
0.05764507814606441
[ ]: