Model Tutorials#
This tutorial covers the different models available in DeepTimeSeries and how to use them.
All models in DeepTimeSeries inherit from ForecastingModule and follow the same interface,
making it easy to switch between different architectures.
MLP Model#
The MLP (Multi-Layer Perceptron) model is a simple feedforward neural network that flattens the encoding window and processes it through fully connected layers.
Basic Usage#
import numpy as np
import pandas as pd
import pytorch_lightning as pl
from torch.utils.data import DataLoader
from sklearn.preprocessing import StandardScaler
import deep_time_series as dts
from deep_time_series.model import MLP
# Prepare data
data = pd.DataFrame({
'target': np.sin(np.arange(100)),
'feature': np.cos(np.arange(100))
})
# Preprocess
transformer = dts.ColumnTransformer(
transformer_tuples=[(StandardScaler(), ['target', 'feature'])]
)
data = transformer.fit_transform(data)
# Create MLP model
model = MLP(
hidden_size=64,
encoding_length=10,
decoding_length=5,
target_names=['target'],
nontarget_names=['feature'],
n_hidden_layers=2,
activation=torch.nn.ELU,
dropout_rate=0.1,
)
# Create dataset and train
dataset = dts.TimeSeriesDataset(
data_frames=data,
chunk_specs=model.make_chunk_specs()
)
dataloader = DataLoader(dataset, batch_size=32)
trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, train_dataloaders=dataloader)
Parameters#
hidden_size: Size of hidden layersencoding_length: Length of encoding windowdecoding_length: Length of decoding windowtarget_names: List of target feature namesnontarget_names: List of non-target feature namesn_hidden_layers: Number of hidden layersactivation: Activation function class (default:nn.ELU)dropout_rate: Dropout rate (default: 0.0)
RNN Models#
The RNN model supports vanilla RNN, LSTM, and GRU architectures. It uses a recurrent encoder and decoder for sequential processing.
Basic Usage#
import torch.nn as nn
from deep_time_series.model import RNN
# Create LSTM model
model = RNN(
hidden_size=128,
encoding_length=20,
decoding_length=10,
target_names=['target'],
nontarget_names=['feature'],
n_layers=2,
rnn_class=nn.LSTM, # or nn.RNN, nn.GRU
dropout_rate=0.1,
)
# Use the same way as MLP
dataset = dts.TimeSeriesDataset(
data_frames=data,
chunk_specs=model.make_chunk_specs()
)
dataloader = DataLoader(dataset, batch_size=32)
trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, train_dataloaders=dataloader)
RNN Variants#
You can use different RNN variants:
# Vanilla RNN
model_rnn = RNN(..., rnn_class=nn.RNN)
# LSTM
model_lstm = RNN(..., rnn_class=nn.LSTM)
# GRU
model_gru = RNN(..., rnn_class=nn.GRU)
Parameters#
hidden_size: Hidden state sizeencoding_length: Length of encoding windowdecoding_length: Length of decoding windowtarget_names: List of target feature namesnontarget_names: List of non-target feature namesn_layers: Number of RNN layersrnn_class: RNN class (nn.RNN,nn.LSTM, ornn.GRU)dropout_rate: Dropout rate between RNN layers
Dilated CNN Model#
The Dilated CNN model uses dilated convolutions to capture long-range dependencies in time series data. It’s particularly effective for sequences with periodic patterns.
Basic Usage#
from deep_time_series.model import DilatedCNN
model = DilatedCNN(
hidden_size=64,
encoding_length=30,
decoding_length=10,
target_names=['target'],
nontarget_names=['feature'],
dilation_base=2,
kernel_size=3,
activation=torch.nn.ELU,
dropout_rate=0.1,
)
dataset = dts.TimeSeriesDataset(
data_frames=data,
chunk_specs=model.make_chunk_specs()
)
dataloader = DataLoader(dataset, batch_size=32)
trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, train_dataloaders=dataloader)
How Dilated CNN Works#
The model automatically calculates the number of layers needed based on:
- encoding_length: The input sequence length
- dilation_base: Base for exponential dilation (e.g., 2 means dilations: 1, 2, 4, 8, …)
- kernel_size: Size of convolutional kernel
The dilation increases exponentially with each layer, allowing the model to capture dependencies at different time scales.
Parameters#
hidden_size: Number of convolutional filtersencoding_length: Length of encoding windowdecoding_length: Length of decoding windowtarget_names: List of target feature namesnontarget_names: List of non-target feature namesdilation_base: Base for exponential dilation (typically 2)kernel_size: Size of convolutional kernel (must be >= dilation_base)activation: Activation function class (default:nn.ELU)dropout_rate: Dropout rate (default: 0.0)
Transformer Model#
The SingleShotTransformer model uses a transformer architecture with encoder-decoder structure. It’s effective for capturing complex temporal dependencies and long-range patterns.
Basic Usage#
from deep_time_series.model import SingleShotTransformer
model = SingleShotTransformer(
encoding_length=30,
decoding_length=10,
target_names=['target'],
nontarget_names=['feature'],
d_model=128,
n_heads=8,
n_layers=4,
dim_feedforward=512,
dropout_rate=0.1,
)
dataset = dts.TimeSeriesDataset(
data_frames=data,
chunk_specs=model.make_chunk_specs()
)
dataloader = DataLoader(dataset, batch_size=32)
trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, train_dataloaders=dataloader)
Transformer Architecture#
The transformer uses: - Encoder: Processes the encoding window with self-attention - Decoder: Generates predictions using cross-attention to encoder outputs - Positional Encoding: Adds positional information to inputs - Causal Masking: Prevents decoder from seeing future information during training
Parameters#
encoding_length: Length of encoding windowdecoding_length: Length of decoding windowtarget_names: List of target feature namesnontarget_names: List of non-target feature namesd_model: Dimension of model (embedding size)n_heads: Number of attention headsn_layers: Number of encoder/decoder layersdim_feedforward: Dimension of feedforward network (default: 4 * d_model)dropout_rate: Dropout rate (default: 0.0)
Model Comparison#
Choosing the Right Model#
- MLP
Simple and fast
Good for short sequences and simple patterns
No explicit temporal modeling
- RNN (LSTM/GRU)
Good for sequential dependencies
Can handle variable-length sequences
May struggle with very long sequences
- Dilated CNN
Efficient for long sequences
Good for periodic patterns
Parallel processing (faster than RNN)
- Transformer
Best for complex patterns and long-range dependencies
Most flexible but computationally expensive
Requires more data to train effectively
Feature Support#
All models support:
Target features: Variables to predict
Non-target features: Additional features known at prediction time
Deterministic forecasting: Point predictions
Probabilistic forecasting: Distribution predictions (with DistributionHead)
Example: Using Non-Target Features#
All models can use non-target features:
# Model with non-target features
model = MLP(
hidden_size=64,
encoding_length=10,
decoding_length=5,
target_names=['target'],
nontarget_names=['feature1', 'feature2'], # Multiple features
n_hidden_layers=2,
)
The non-target features are: - Used during encoding (along with target features) - Available during decoding (future values must be known)
Customizing Models#
All models support custom heads, loss functions, and optimizers:
import torch.nn as nn
from deep_time_series.core import Head
# Custom head with L1 loss
custom_head = Head(
tag='targets',
output_module=nn.Linear(64, 1),
loss_fn=nn.L1Loss(),
loss_weight=1.0,
)
model = MLP(
hidden_size=64,
encoding_length=10,
decoding_length=5,
target_names=['target'],
nontarget_names=[],
n_hidden_layers=2,
head=custom_head, # Use custom head
)
For more advanced customization, see Advanced Topics.