Utilities#
Utility Functions#
The util module provides helper functions for data manipulation and dictionary operations. These are internal utilities used by the framework but may also be useful for custom implementations.
Purpose:
These functions handle common operations needed when working with time series data and model outputs: - Set operations for validation - Dictionary merging for combining model outputs - DataFrame merging for handling multiple time series
logical_and_for_set_list#
Compute the intersection of a list of sets.
Purpose:
Finds elements that are common to all sets in a list. Used internally for validation (e.g., checking for duplicate keys in dictionaries).
Parameters:
set_list(list[set]): List of sets to compute the intersection of. Must contain at least one set.
Returns:
set: A set containing elements that appear in all input sets. If any set is empty or there are no common elements, returns an empty set.
When to Use:
Validating that multiple sets have no common elements (check if result is empty)
Finding common elements across multiple sets
Internal validation in framework code
Example:
from deep_time_series.util import logical_and_for_set_list
sets = [
{'a', 'b', 'c'},
{'b', 'c', 'd'},
{'c', 'd', 'e'},
]
common = logical_and_for_set_list(sets) # {'c'}
# Check for duplicates (common use case)
if logical_and_for_set_list([set(d1.keys()), set(d2.keys())]):
raise ValueError("Duplicate keys found!")
Note:
Typically used internally by the framework for validation purposes. The function computes the intersection sequentially: set1 & set2 & set3 & ....
logical_or_for_set_list#
Compute the union of a list of sets.
Purpose:
Finds all unique elements across all sets in a list. Used internally for combining sets.
Parameters:
set_list(list[set]): List of sets to compute the union of. Must contain at least one set.
Returns:
set: A set containing all unique elements from all input sets.
When to Use:
Combining multiple sets into one
Finding all unique elements across sets
Internal operations in framework code
Example:
from deep_time_series.util import logical_or_for_set_list
sets = [
{'a', 'b'},
{'b', 'c'},
{'c', 'd'},
]
union = logical_or_for_set_list(sets) # {'a', 'b', 'c', 'd'}
# Get all unique keys from multiple dictionaries
all_keys = logical_or_for_set_list([set(d.keys()) for d in dict_list])
Note:
Typically used internally by the framework. The function computes the union sequentially: set1 | set2 | set3 | ....
merge_dicts#
Merge multiple dictionaries into a single dictionary. Raises an error if keys are duplicated.
Purpose:
Combines multiple dictionaries into one, ensuring no key conflicts. Used extensively in the framework to merge encoder outputs with decoder inputs.
Parameters:
dicts(list[dict]): List of dictionaries to merge. All dictionaries will be combined into one.ignore_keys(set | list[str] | None): Optional set or list of keys to ignore during merging. These keys will be excluded from the result even if they appear in multiple dictionaries. Default isNone.
Returns:
dict: A new dictionary containing all key-value pairs from all input dictionaries (except ignored keys). Maintains insertion order (Python 3.7+).
Raises:
AssertionError: If any keys overlap between dictionaries (unless they are inignore_keys).
When to Use:
Combining model outputs from different stages (e.g., encoder + decoder)
Merging multiple dictionaries without key conflicts
Internal framework operations
Key Features:
Duplicate Detection: Raises an assertion error if any keys overlap
Key Filtering: Can ignore specific keys during merging
Order Preservation: Maintains insertion order (Python 3.7+)
Example:
from deep_time_series.util import merge_dicts
dict1 = {'a': 1, 'b': 2}
dict2 = {'c': 3, 'd': 4}
merged = merge_dicts([dict1, dict2]) # {'a': 1, 'b': 2, 'c': 3, 'd': 4}
# With ignore_keys
dict1 = {'a': 1, 'b': 2, 'temp': 999}
dict2 = {'c': 3}
merged = merge_dicts([dict1, dict2], ignore_keys=['temp'])
# {'a': 1, 'b': 2, 'c': 3}
Use in ForecastingModule:
The ForecastingModule.forward() method uses this to merge encoder outputs with inputs for the decoder:
encoder_outputs = self.encode(inputs)
decoder_inputs = merge_dicts([inputs, encoder_outputs])
outputs = self.decode(decoder_inputs)
Important:
Keys must be unique across all dictionaries (unless in ignore_keys)
Raises AssertionError if duplicates are found
The function creates a new dictionary; original dictionaries are not modified
merge_data_frames#
Merge multiple pandas DataFrames by concatenating them and adding time_index and time_series_id columns.
Purpose:
Combines multiple time series DataFrames into a single DataFrame while preserving information about which series each row belongs to. This is useful when working with multiple related time series that need to be analyzed together.
Parameters:
dfs(list[pd.DataFrame]): List of DataFrames to merge. Each DataFrame represents a separate time series. DataFrames should have compatible column structures (same column names and types).
Returns:
pd.DataFrame: A single DataFrame containing all rows from all input DataFrames, with two additional columns: -time_index: The original index values from each DataFrame -time_series_id: Integer identifier (0, 1, 2, …) indicating which DataFrame each row came from
Key Features:
Time Index Preservation: Adds original index as ‘time_index’ column
Series Identification: Adds ‘time_series_id’ to track source DataFrame
Deep Copy: Creates copies to avoid modifying original DataFrames
Index Reset: Resets the index of the merged DataFrame (uses default integer index)
When to Use:
Combining multiple time series for analysis
Preparing data from multiple sources
Creating a unified dataset from separate series
Preprocessing multiple series together with
ColumnTransformer
Example:
import pandas as pd
import numpy as np
from deep_time_series.util import merge_data_frames
# Multiple time series from different sensors
df1 = pd.DataFrame({
'temperature': np.sin(np.arange(100)),
'humidity': np.random.rand(100)
})
df2 = pd.DataFrame({
'temperature': np.cos(np.arange(100)),
'humidity': np.random.rand(100)
})
df3 = pd.DataFrame({
'temperature': np.random.randn(100),
'humidity': np.random.rand(100)
})
# Merge with tracking
merged = merge_data_frames([df1, df2, df3])
# Result has columns: ['temperature', 'humidity', 'time_index', 'time_series_id']
# time_series_id: 0 for df1, 1 for df2, 2 for df3
# time_index: original index values from each DataFrame
Output Format:
The merged DataFrame includes:
- All original columns from input DataFrames
- time_index: Original index values (preserved from each source DataFrame)
- time_series_id: Integer ID (0, 1, 2, …) indicating source DataFrame
Use Cases:
Combining data from multiple sensors/locations
Merging training and validation sets for preprocessing
Creating unified datasets for analysis
Preparing data for models that can handle multiple time series
Plotting#
The plotting module provides visualization utilities for time series data.
plot_chunks#
Visualize chunk specifications as horizontal bars showing the time windows for encoding, decoding, and labels.
Purpose:
Creates a visual representation of chunk specifications, making it easy to understand the temporal structure of your model’s input/output windows. This visualization helps debug chunk configurations and understand how data flows through the model.
Parameters:
chunk_specs(list[BaseChunkSpec]): List of chunk specifications to visualize. Each chunk will be displayed as a horizontal bar.
Returns:
None: The function modifies the current matplotlib figure/axes in place. Useplt.show()orplt.savefig()to display or save the plot.
When to Use:
Understanding model architecture
Debugging chunk specifications
Visualizing data windows
Documentation and presentations
Verifying that chunk ranges are correct
Output:
Creates a horizontal bar chart where: - Each bar represents a chunk specification - Bar position (left edge) shows the start of the time range - Bar width shows the window length (end - start) - Labels show the chunk tag - Y-axis position indicates different chunks
Example:
import matplotlib.pyplot as plt
from deep_time_series.plotting import plot_chunks
from deep_time_series.chunk import EncodingChunkSpec, LabelChunkSpec, DecodingChunkSpec
import numpy as np
# Create chunk specifications
chunk_specs = [
EncodingChunkSpec('targets', ['temp'], (0, 10), np.float32),
DecodingChunkSpec('nontargets', ['humidity'], (10, 15), np.float32),
LabelChunkSpec('targets', ['temp'], (10, 15), np.float32),
]
# Visualize
plot_chunks(chunk_specs)
plt.xlabel('Time Index')
plt.title('Chunk Specifications')
plt.show()
Integration with TimeSeriesDataset:
The TimeSeriesDataset class provides a convenience method:
from deep_time_series.dataset import TimeSeriesDataset
dataset = TimeSeriesDataset(data_frames=data, chunk_specs=chunk_specs)
dataset.plot_chunks() # Visualize the chunks used by this dataset
plt.show()
Visualization Details:
Uses matplotlib’s
barh()for horizontal barsAlpha transparency (0.8) for overlapping bars
Annotations show chunk tags at the left edge of each bar
Y-axis shows different chunks (numbered from 1)
X-axis shows time indices
Note:
Requires matplotlib to be installed
The function modifies the current matplotlib figure/axes
You may want to add labels and title using
plt.xlabel(),plt.ylabel(),plt.title()Call
plt.show()orplt.savefig()after calling this function to display or save the plot