Utilities#

Utility Functions#

The util module provides helper functions for data manipulation and dictionary operations. These are internal utilities used by the framework but may also be useful for custom implementations.

Purpose:

These functions handle common operations needed when working with time series data and model outputs: - Set operations for validation - Dictionary merging for combining model outputs - DataFrame merging for handling multiple time series

logical_and_for_set_list#

Compute the intersection of a list of sets.

Purpose:

Finds elements that are common to all sets in a list. Used internally for validation (e.g., checking for duplicate keys in dictionaries).

Parameters:

set_list (list[set]): List of sets to compute the intersection of. Must contain at least one set.

Returns:

set: A set containing elements that appear in all input sets. If any set is empty or there are no common elements, returns an empty set.

When to Use:

Validating that multiple sets have no common elements (check if result is empty)
Finding common elements across multiple sets
Internal validation in framework code

Example:

from deep_time_series.util import logical_and_for_set_list

sets = [
    {'a', 'b', 'c'},
    {'b', 'c', 'd'},
    {'c', 'd', 'e'},
]
common = logical_and_for_set_list(sets)  # {'c'}

# Check for duplicates (common use case)
if logical_and_for_set_list([set(d1.keys()), set(d2.keys())]):
    raise ValueError("Duplicate keys found!")

Note:

Typically used internally by the framework for validation purposes. The function computes the intersection sequentially: set1 & set2 & set3 & ....

logical_and_for_set_list(set_list)[source]#

logical_or_for_set_list#

Compute the union of a list of sets.

Purpose:

Finds all unique elements across all sets in a list. Used internally for combining sets.

Parameters:

set_list (list[set]): List of sets to compute the union of. Must contain at least one set.

Returns:

set: A set containing all unique elements from all input sets.

When to Use:

Combining multiple sets into one
Finding all unique elements across sets
Internal operations in framework code

Example:

from deep_time_series.util import logical_or_for_set_list

sets = [
    {'a', 'b'},
    {'b', 'c'},
    {'c', 'd'},
]
union = logical_or_for_set_list(sets)  # {'a', 'b', 'c', 'd'}

# Get all unique keys from multiple dictionaries
all_keys = logical_or_for_set_list([set(d.keys()) for d in dict_list])

Note:

Typically used internally by the framework. The function computes the union sequentially: set1 | set2 | set3 | ....

logical_or_for_set_list(set_list)[source]#

merge_dicts#

Merge multiple dictionaries into a single dictionary. Raises an error if keys are duplicated.

Purpose:

Combines multiple dictionaries into one, ensuring no key conflicts. Used extensively in the framework to merge encoder outputs with decoder inputs.

Parameters:

dicts (list[dict]): List of dictionaries to merge. All dictionaries will be combined into one.
ignore_keys (set | list[str] | None): Optional set or list of keys to ignore during merging. These keys will be excluded from the result even if they appear in multiple dictionaries. Default is None.

Returns:

dict: A new dictionary containing all key-value pairs from all input dictionaries (except ignored keys). Maintains insertion order (Python 3.7+).

Raises:

AssertionError: If any keys overlap between dictionaries (unless they are in ignore_keys).

When to Use:

Combining model outputs from different stages (e.g., encoder + decoder)
Merging multiple dictionaries without key conflicts
Internal framework operations

Key Features:

Duplicate Detection: Raises an assertion error if any keys overlap
Key Filtering: Can ignore specific keys during merging
Order Preservation: Maintains insertion order (Python 3.7+)

Example:

from deep_time_series.util import merge_dicts

dict1 = {'a': 1, 'b': 2}
dict2 = {'c': 3, 'd': 4}
merged = merge_dicts([dict1, dict2])  # {'a': 1, 'b': 2, 'c': 3, 'd': 4}

# With ignore_keys
dict1 = {'a': 1, 'b': 2, 'temp': 999}
dict2 = {'c': 3}
merged = merge_dicts([dict1, dict2], ignore_keys=['temp'])
# {'a': 1, 'b': 2, 'c': 3}

Use in ForecastingModule:

The ForecastingModule.forward() method uses this to merge encoder outputs with inputs for the decoder:

encoder_outputs = self.encode(inputs)
decoder_inputs = merge_dicts([inputs, encoder_outputs])
outputs = self.decode(decoder_inputs)

Important:

Keys must be unique across all dictionaries (unless in ignore_keys)
Raises AssertionError if duplicates are found
The function creates a new dictionary; original dictionaries are not modified

merge_dicts(dicts, ignore_keys=None)[source]#

merge_data_frames#

Merge multiple pandas DataFrames by concatenating them and adding time_index and time_series_id columns.

Purpose:

Combines multiple time series DataFrames into a single DataFrame while preserving information about which series each row belongs to. This is useful when working with multiple related time series that need to be analyzed together.

Parameters:

dfs (list[pd.DataFrame]): List of DataFrames to merge. Each DataFrame represents a separate time series. DataFrames should have compatible column structures (same column names and types).

Returns:

pd.DataFrame: A single DataFrame containing all rows from all input DataFrames, with two additional columns: - time_index: The original index values from each DataFrame - time_series_id: Integer identifier (0, 1, 2, …) indicating which DataFrame each row came from

Key Features:

Time Index Preservation: Adds original index as ‘time_index’ column
Series Identification: Adds ‘time_series_id’ to track source DataFrame
Deep Copy: Creates copies to avoid modifying original DataFrames
Index Reset: Resets the index of the merged DataFrame (uses default integer index)

When to Use:

Combining multiple time series for analysis
Preparing data from multiple sources
Creating a unified dataset from separate series
Preprocessing multiple series together with ColumnTransformer

Example:

import pandas as pd
import numpy as np
from deep_time_series.util import merge_data_frames

# Multiple time series from different sensors
df1 = pd.DataFrame({
    'temperature': np.sin(np.arange(100)),
    'humidity': np.random.rand(100)
})
df2 = pd.DataFrame({
    'temperature': np.cos(np.arange(100)),
    'humidity': np.random.rand(100)
})
df3 = pd.DataFrame({
    'temperature': np.random.randn(100),
    'humidity': np.random.rand(100)
})

# Merge with tracking
merged = merge_data_frames([df1, df2, df3])
# Result has columns: ['temperature', 'humidity', 'time_index', 'time_series_id']
# time_series_id: 0 for df1, 1 for df2, 2 for df3
# time_index: original index values from each DataFrame

Output Format:

The merged DataFrame includes: - All original columns from input DataFrames - time_index: Original index values (preserved from each source DataFrame) - time_series_id: Integer ID (0, 1, 2, …) indicating source DataFrame

Use Cases:

Combining data from multiple sensors/locations
Merging training and validation sets for preprocessing
Creating unified datasets for analysis
Preparing data for models that can handle multiple time series

merge_data_frames(dfs)[source]#

Plotting#

The plotting module provides visualization utilities for time series data.

plot_chunks#

Visualize chunk specifications as horizontal bars showing the time windows for encoding, decoding, and labels.

Purpose:

Creates a visual representation of chunk specifications, making it easy to understand the temporal structure of your model’s input/output windows. This visualization helps debug chunk configurations and understand how data flows through the model.

Parameters:

chunk_specs (list[BaseChunkSpec]): List of chunk specifications to visualize. Each chunk will be displayed as a horizontal bar.

Returns:

None: The function modifies the current matplotlib figure/axes in place. Use plt.show() or plt.savefig() to display or save the plot.

When to Use:

Understanding model architecture
Debugging chunk specifications
Visualizing data windows
Documentation and presentations
Verifying that chunk ranges are correct

Output:

Creates a horizontal bar chart where: - Each bar represents a chunk specification - Bar position (left edge) shows the start of the time range - Bar width shows the window length (end - start) - Labels show the chunk tag - Y-axis position indicates different chunks

Example:

import matplotlib.pyplot as plt
from deep_time_series.plotting import plot_chunks
from deep_time_series.chunk import EncodingChunkSpec, LabelChunkSpec, DecodingChunkSpec
import numpy as np

# Create chunk specifications
chunk_specs = [
    EncodingChunkSpec('targets', ['temp'], (0, 10), np.float32),
    DecodingChunkSpec('nontargets', ['humidity'], (10, 15), np.float32),
    LabelChunkSpec('targets', ['temp'], (10, 15), np.float32),
]

# Visualize
plot_chunks(chunk_specs)
plt.xlabel('Time Index')
plt.title('Chunk Specifications')
plt.show()

Integration with TimeSeriesDataset:

The TimeSeriesDataset class provides a convenience method:

from deep_time_series.dataset import TimeSeriesDataset

dataset = TimeSeriesDataset(data_frames=data, chunk_specs=chunk_specs)
dataset.plot_chunks()  # Visualize the chunks used by this dataset
plt.show()

Visualization Details:

Uses matplotlib’s barh() for horizontal bars
Alpha transparency (0.8) for overlapping bars
Annotations show chunk tags at the left edge of each bar
Y-axis shows different chunks (numbered from 1)
X-axis shows time indices

Note:

Requires matplotlib to be installed
The function modifies the current matplotlib figure/axes
You may want to add labels and title using plt.xlabel(), plt.ylabel(), plt.title()
Call plt.show() or plt.savefig() after calling this function to display or save the plot

plot_chunks(chunk_specs)[source]#

plot_chunks(chunk_specs)[source]#

Utilities#

Utility Functions#

logical_and_for_set_list#

logical_or_for_set_list#

merge_dicts#

merge_data_frames#

Plotting#

plot_chunks#

This Page