Model Tutorials#

This tutorial covers the different models available in DeepTimeSeries and how to use them.

All models in DeepTimeSeries inherit from ForecastingModule and follow the same interface, making it easy to switch between different architectures.

MLP Model#

The MLP (Multi-Layer Perceptron) model is a simple feedforward neural network that flattens the encoding window and processes it through fully connected layers.

Basic Usage#

import numpy as np
import pandas as pd
import pytorch_lightning as pl
from torch.utils.data import DataLoader
from sklearn.preprocessing import StandardScaler

import deep_time_series as dts
from deep_time_series.model import MLP

# Prepare data
data = pd.DataFrame({
    'target': np.sin(np.arange(100)),
    'feature': np.cos(np.arange(100))
})

# Preprocess
transformer = dts.ColumnTransformer(
    transformer_tuples=[(StandardScaler(), ['target', 'feature'])]
)
data = transformer.fit_transform(data)

# Create MLP model
model = MLP(
    hidden_size=64,
    encoding_length=10,
    decoding_length=5,
    target_names=['target'],
    nontarget_names=['feature'],
    n_hidden_layers=2,
    activation=torch.nn.ELU,
    dropout_rate=0.1,
)

# Create dataset and train
dataset = dts.TimeSeriesDataset(
    data_frames=data,
    chunk_specs=model.make_chunk_specs()
)
dataloader = DataLoader(dataset, batch_size=32)

trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, train_dataloaders=dataloader)

Parameters#

hidden_size: Size of hidden layers
encoding_length: Length of encoding window
decoding_length: Length of decoding window
target_names: List of target feature names
nontarget_names: List of non-target feature names
n_hidden_layers: Number of hidden layers
activation: Activation function class (default: nn.ELU)
dropout_rate: Dropout rate (default: 0.0)

RNN Models#

The RNN model supports vanilla RNN, LSTM, and GRU architectures. It uses a recurrent encoder and decoder for sequential processing.

Basic Usage#

import torch.nn as nn
from deep_time_series.model import RNN

# Create LSTM model
model = RNN(
    hidden_size=128,
    encoding_length=20,
    decoding_length=10,
    target_names=['target'],
    nontarget_names=['feature'],
    n_layers=2,
    rnn_class=nn.LSTM,  # or nn.RNN, nn.GRU
    dropout_rate=0.1,
)

# Use the same way as MLP
dataset = dts.TimeSeriesDataset(
    data_frames=data,
    chunk_specs=model.make_chunk_specs()
)
dataloader = DataLoader(dataset, batch_size=32)

trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, train_dataloaders=dataloader)

RNN Variants#

You can use different RNN variants:

# Vanilla RNN
model_rnn = RNN(..., rnn_class=nn.RNN)

# LSTM
model_lstm = RNN(..., rnn_class=nn.LSTM)

# GRU
model_gru = RNN(..., rnn_class=nn.GRU)

Parameters#

hidden_size: Hidden state size
encoding_length: Length of encoding window
decoding_length: Length of decoding window
target_names: List of target feature names
nontarget_names: List of non-target feature names
n_layers: Number of RNN layers
rnn_class: RNN class (nn.RNN, nn.LSTM, or nn.GRU)
dropout_rate: Dropout rate between RNN layers

Dilated CNN Model#

The Dilated CNN model uses dilated convolutions to capture long-range dependencies in time series data. It’s particularly effective for sequences with periodic patterns.

Basic Usage#

from deep_time_series.model import DilatedCNN

model = DilatedCNN(
    hidden_size=64,
    encoding_length=30,
    decoding_length=10,
    target_names=['target'],
    nontarget_names=['feature'],
    dilation_base=2,
    kernel_size=3,
    activation=torch.nn.ELU,
    dropout_rate=0.1,
)

dataset = dts.TimeSeriesDataset(
    data_frames=data,
    chunk_specs=model.make_chunk_specs()
)
dataloader = DataLoader(dataset, batch_size=32)

trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, train_dataloaders=dataloader)

How Dilated CNN Works#

The model automatically calculates the number of layers needed based on: - encoding_length: The input sequence length - dilation_base: Base for exponential dilation (e.g., 2 means dilations: 1, 2, 4, 8, …) - kernel_size: Size of convolutional kernel

The dilation increases exponentially with each layer, allowing the model to capture dependencies at different time scales.

Parameters#

hidden_size: Number of convolutional filters
encoding_length: Length of encoding window
decoding_length: Length of decoding window
target_names: List of target feature names
nontarget_names: List of non-target feature names
dilation_base: Base for exponential dilation (typically 2)
kernel_size: Size of convolutional kernel (must be >= dilation_base)
activation: Activation function class (default: nn.ELU)
dropout_rate: Dropout rate (default: 0.0)

Transformer Model#

The SingleShotTransformer model uses a transformer architecture with encoder-decoder structure. It’s effective for capturing complex temporal dependencies and long-range patterns.

Basic Usage#

from deep_time_series.model import SingleShotTransformer

model = SingleShotTransformer(
    encoding_length=30,
    decoding_length=10,
    target_names=['target'],
    nontarget_names=['feature'],
    d_model=128,
    n_heads=8,
    n_layers=4,
    dim_feedforward=512,
    dropout_rate=0.1,
)

dataset = dts.TimeSeriesDataset(
    data_frames=data,
    chunk_specs=model.make_chunk_specs()
)
dataloader = DataLoader(dataset, batch_size=32)

trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, train_dataloaders=dataloader)

Transformer Architecture#

The transformer uses: - Encoder: Processes the encoding window with self-attention - Decoder: Generates predictions using cross-attention to encoder outputs - Positional Encoding: Adds positional information to inputs - Causal Masking: Prevents decoder from seeing future information during training

Parameters#

encoding_length: Length of encoding window
decoding_length: Length of decoding window
target_names: List of target feature names
nontarget_names: List of non-target feature names
d_model: Dimension of model (embedding size)
n_heads: Number of attention heads
n_layers: Number of encoder/decoder layers
dim_feedforward: Dimension of feedforward network (default: 4 * d_model)
dropout_rate: Dropout rate (default: 0.0)

Model Comparison#

Choosing the Right Model#

MLP

Simple and fast
Good for short sequences and simple patterns
No explicit temporal modeling

RNN (LSTM/GRU)

Good for sequential dependencies
Can handle variable-length sequences
May struggle with very long sequences

Dilated CNN

Efficient for long sequences
Good for periodic patterns
Parallel processing (faster than RNN)

Transformer

Best for complex patterns and long-range dependencies
Most flexible but computationally expensive
Requires more data to train effectively

Feature Support#

All models support:

Target features: Variables to predict
Non-target features: Additional features known at prediction time
Deterministic forecasting: Point predictions
Probabilistic forecasting: Distribution predictions (with DistributionHead)

Example: Using Non-Target Features#

All models can use non-target features:

# Model with non-target features
model = MLP(
    hidden_size=64,
    encoding_length=10,
    decoding_length=5,
    target_names=['target'],
    nontarget_names=['feature1', 'feature2'],  # Multiple features
    n_hidden_layers=2,
)

The non-target features are: - Used during encoding (along with target features) - Available during decoding (future values must be known)

Customizing Models#

All models support custom heads, loss functions, and optimizers:

import torch.nn as nn
from deep_time_series.core import Head

# Custom head with L1 loss
custom_head = Head(
    tag='targets',
    output_module=nn.Linear(64, 1),
    loss_fn=nn.L1Loss(),
    loss_weight=1.0,
)

model = MLP(
    hidden_size=64,
    encoding_length=10,
    decoding_length=5,
    target_names=['target'],
    nontarget_names=[],
    n_hidden_layers=2,
    head=custom_head,  # Use custom head
)

For more advanced customization, see Advanced Topics.

Model Tutorials#

MLP Model#

Basic Usage#

Parameters#

RNN Models#

Basic Usage#

RNN Variants#

Parameters#

Dilated CNN Model#

Basic Usage#

How Dilated CNN Works#

Parameters#

Transformer Model#

Basic Usage#

Transformer Architecture#

Parameters#

Model Comparison#

Choosing the Right Model#

Feature Support#

Example: Using Non-Target Features#

Customizing Models#

This Page