Transformations Example

[1]:
import pandas as pd
import matplotlib.pyplot as plt
from scalecast.Forecaster import Forecaster
from scalecast.SeriesTransformer import SeriesTransformer
[2]:
data = pd.read_csv('../lstm/AirPassengers.csv')
[3]:
data.head()
[3]:
Month #Passengers
0 1949-01 112
1 1949-02 118
2 1949-03 132
3 1949-04 129
4 1949-05 121
[4]:
f = Forecaster(
    current_dates = data['Month'],
    y = data['#Passengers'],
    future_dates = 24,
)

Create Thumbnail Image

[5]:
f_detrended = SeriesTransformer(f).DetrendTransform(poly_order=2)
f_diff = SeriesTransformer(f).DiffTransform()
f_diff_seas = SeriesTransformer(f).DiffTransform(12)
[6]:
fig, axs = plt.subplots(2,2,figsize=(14,6))
f.plot(ax=axs[0,0])
axs[0,0].set_title('Original Series',size=14)
axs[0,0].tick_params(axis='x',which='both',bottom=False,top=False,labelbottom=False)
axs[0,0].get_legend().remove()
f_detrended.plot(ax=axs[0,1])
axs[0,1].set_title('Detrended',size=14)
axs[0,1].tick_params(axis='x',which='both',bottom=False,top=False,labelbottom=False)
axs[0,1].get_legend().remove()
f_diff.plot(ax=axs[1,0])
axs[1,0].set_title('Differenced',size=14)
axs[1,0].get_legend().remove()
f_diff_seas.plot(ax=axs[1,1])
axs[1,1].set_title('Seasonally Differenced',size=14)
axs[1,1].get_legend().remove()
plt.show()
../_images/transforming_medium_code_7_0.png

Create Transformer

[7]:
transformer = SeriesTransformer(f)

Apply Transformations

[8]:
f = transformer.DiffTransform(12) # 12 periods is one seasonal difference for monthly data
f = transformer.DetrendTransform()
f.plot()
plt.title('Seasonally Differenced and Detrended Series',size=14);
../_images/transforming_medium_code_11_0.png

Forecast on Transformed Data

[9]:
f.set_estimator('xgboost')
f.add_ar_terms(12)
f.manual_forecast(n_estimators=100,gamma=2)
[10]:
f.plot()
plt.title('Xgboost Applied on Transformed Series',size=14);
../_images/transforming_medium_code_14_0.png

Revert Transformation

[11]:
f = transformer.DetrendRevert()
f = transformer.DiffRevert(12)
f.plot()
plt.title('Back to Normal',size=14);
../_images/transforming_medium_code_16_0.png

Note: After reverting a difference transforamtion, it is a good idea to drop all Xvars from the object and re-add them, especially model lags, since their values are now at a different level and they will have lost some observations from the front.

[12]:
f.drop_all_Xvars()

Function to Automatically Find Optimal Transformation

[13]:
from scalecast.util import find_optimal_transformation
# default args below
transformer, reverter = find_optimal_transformation(
    f, # Forecaster object to try the transformations on
    estimator=None, # model used to evaluate each transformation, default last estimator set in object
    monitor='rmse', # out-of-sample metric to monitor
    test_length = None, # default is the fcst horizon in the Forecaster object
    train_length = None, # default is the max available
    num_test_sets = 1, # number of test sets to iterate through, final transformation based on best avg. metric
    space_between_sets = 1, # space between consectutive train sets
    lags='auto', # uses the length of the inferred seasonality
    try_order = ['detrend','seasonal_adj','boxcox','first_diff','first_seasonal_diff','scale'], # order of transformations to try
    boxcox_lambdas = [-0.5,0,0.5], # box-cox lambas
    detrend_kwargs = [{'loess': True},{'poly_order':1},{'poly_order':2}], # detrender transform kwargs (tries as many detrenders as the length of this list)
    scale_type = ['Scale','MinMax'], # scale transformers to try
    m = 'auto', # the seasonal length to try for the seasonal adjusters, accepts multiple
    model = 'add', # the model to use when seasonally adjusting
    verbose = True, # default is False
    # specific model kwargs also accepted
)
All transformation tries will use 12 lags.
Last transformer tried:
[]
Score (rmse): 73.54387726568602
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'loess': True})]
Score (rmse): 64.9621416143047
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 1})]
Score (rmse): 32.35918665790083
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2})]
Score (rmse): 22.916929563263274
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2}), ('DeseasonTransform', {'m': 12, 'model': 'add'})]
Score (rmse): 36.738799031744186
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2}), ('DiffTransform', 1)]
Score (rmse): 55.37104438051655
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2}), ('DiffTransform', 12)]
Score (rmse): 46.742630596791805
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2}), ('ScaleTransform',)]
Score (rmse): 22.798651665783563
--------------------------------------------------
Last transformer tried:
[('DetrendTransform', {'poly_order': 2}), ('MinMaxTransform',)]
Score (rmse): 21.809089561053717
--------------------------------------------------
Final Selection:
[('DetrendTransform', {'poly_order': 2}), ('MinMaxTransform',)]

Automated Forecasting with Pipeline

[14]:
from scalecast.Pipeline import Pipeline
from scalecast import GridGenerator
from scalecast.util import find_optimal_transformation

GridGenerator.get_example_grids()

def forecaster(f):
    f.set_validation_length(20)
    f.auto_Xvar_select(max_ar=20)
    f.tune_test_forecast(
        ['elasticnet','xgboost'],
        cross_validate=True,
        limit_grid_size = .2,
    )

pipeline = Pipeline(
    steps = [
        ('Transform',transformer),
        ('Forecast',forecaster),
        ('Revert',reverter),
    ],
)

f = pipeline.fit_predict(f)
f.plot()
plt.title('Automated Forecasting with Transformations');
../_images/transforming_medium_code_22_0.png
[ ]: