Theta

Scalecast ports the model from darts, which is supposed to be more accurate and is also easier to maintain.

[1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from darts.utils.utils import SeasonalityMode, TrendMode, ModelMode
from scalecast.Forecaster import Forecaster
from scalecast.util import metrics
from scalecast import GridGenerator
[2]:
train = pd.read_csv('Hourly-train.csv',index_col=0)
test = pd.read_csv('Hourly-test.csv',index_col=0)
y = train.loc['H7'].to_list()
current_dates = pd.date_range(start='2015-01-07 12:00',freq='H',periods=len(y)).to_list()

y_test = test.loc['H7'].to_list()

f = Forecaster(
    y=y,
    current_dates=current_dates,
    metrics = ['smape','r2'],
    test_length = .25,
    future_dates = len(y_test),
    cis = True,
)

f
[2]:
Forecaster(
    DateStartActuals=2015-01-18T08:00:00.000000000
    DateEndActuals=2015-02-16T11:00:00.000000000
    Freq=H
    N_actuals=700
    ForecastLength=48
    Xvars=[]
    TestLength=240
    ValidationMetric=smape
    ForecastsEvaluated=[]
    CILevel=0.95
    CurrentEstimator=mlr
    GridsFile=Grids
)
[3]:
f.plot()
plt.show()
../_images/theta_theta_3_0.png

Prepare forecast

  • Download theta’s validation grid

[6]:
GridGenerator.get_grids('theta',out_name='Grids.py')
f.ingest_grid('theta')

Call the forecast

  • tune hyperparemters with 3-fold time series cross validation

[8]:
f.set_estimator('theta')
f.cross_validate(k=3,verbose=True)
f.auto_forecast()
Num hyperparams to try for the theta model: 48.
Fold 0: Train size: 345 (2015-01-18 08:00:00 - 2015-02-01 16:00:00). Test Size: 115 (2015-02-01 17:00:00 - 2015-02-06 11:00:00).
Fold 1: Train size: 230 (2015-01-18 08:00:00 - 2015-01-27 21:00:00). Test Size: 115 (2015-01-27 22:00:00 - 2015-02-01 16:00:00).
Fold 2: Train size: 115 (2015-01-18 08:00:00 - 2015-01-23 02:00:00). Test Size: 115 (2015-01-23 03:00:00 - 2015-01-27 21:00:00).
Chosen paramaters: {'theta': 0.5, 'model_mode': <ModelMode.ADDITIVE: 'additive'>, 'season_mode': <SeasonalityMode.MULTIPLICATIVE: 'multiplicative'>, 'trend_mode': <TrendMode.EXPONENTIAL: 'exponential'>}.

Visualize test results

[9]:
f.plot_test_set(ci=True)
plt.show()
../_images/theta_theta_9_0.png

Visualize forecast results

[10]:
f.plot(ci=True)
plt.show()
../_images/theta_theta_11_0.png

See in-sample and out-of-sample accuracy/error metrics

[12]:
results = f.export('model_summaries')
[13]:
results[
    [
        'TestSetSMAPE',
        'InSampleSMAPE',
        'TestSetR2',
        'InSampleR2',
        'ValidationMetric',
        'ValidationMetricValue',
        'TestSetLength'
    ]
]
[13]:
TestSetSMAPE InSampleSMAPE TestSetR2 InSampleR2 ValidationMetric ValidationMetricValue TestSetLength
0 0.058274 0.014082 0.786943 0.984652 smape 0.064113 240

The validation metric displayed above is the average SMAPE across the three cross-validation folds.

[14]:
validation_grid = f.export_validation_grid('theta')
validation_grid.head()
[14]:
theta model_mode season_mode trend_mode Fold0Metric Fold1Metric Fold2Metric AverageMetric MetricEvaluated
0 0.5 ModelMode.ADDITIVE SeasonalityMode.MULTIPLICATIVE TrendMode.EXPONENTIAL 0.056320 0.067335 0.068684 0.064113 smape
1 0.5 ModelMode.ADDITIVE SeasonalityMode.MULTIPLICATIVE TrendMode.LINEAR 0.056510 0.072710 0.084188 0.071136 smape
2 0.5 ModelMode.ADDITIVE SeasonalityMode.ADDITIVE TrendMode.EXPONENTIAL 0.055679 0.075179 0.073782 0.068213 smape
3 0.5 ModelMode.ADDITIVE SeasonalityMode.ADDITIVE TrendMode.LINEAR 0.056044 0.081705 0.089734 0.075827 smape
4 0.5 ModelMode.MULTIPLICATIVE SeasonalityMode.MULTIPLICATIVE TrendMode.EXPONENTIAL 0.056569 0.071406 0.080493 0.069489 smape

Test the forecast against out-of-sample data

  • this is data the Forecaster object has never seen

[15]:
fcst = f.export('lvl_fcsts')
fcst.head()
[15]:
DATE theta
0 2015-02-16 12:00:00 49921.533332
1 2015-02-16 13:00:00 49135.400143
2 2015-02-16 14:00:00 47126.062514
3 2015-02-16 15:00:00 43417.987575
4 2015-02-16 16:00:00 39867.287257
[16]:
fig, ax = plt.subplots(figsize=(12,6))
f.plot(ax=ax,ci=True)
sns.lineplot(
    x = f.future_dates,
    y = y_test,
    ax = ax,
    label = 'held out actuals',
    color = 'green',
)
plt.show()
../_images/theta_theta_19_0.png
[17]:
smape = metrics.smape(y_test,fcst['theta'])
smape
[17]:
0.05764507814606441
[ ]: