Utilities ========== Utility Functions ----------------- The util module provides helper functions for data manipulation and dictionary operations. These are internal utilities used by the framework but may also be useful for custom implementations. **Purpose:** These functions handle common operations needed when working with time series data and model outputs: - Set operations for validation - Dictionary merging for combining model outputs - DataFrame merging for handling multiple time series logical_and_for_set_list ~~~~~~~~~~~~~~~~~~~~~~~~ Compute the intersection of a list of sets. **Purpose:** Finds elements that are common to all sets in a list. Used internally for validation (e.g., checking for duplicate keys in dictionaries). **Parameters:** - ``set_list`` (list[set]): List of sets to compute the intersection of. Must contain at least one set. **Returns:** - ``set``: A set containing elements that appear in all input sets. If any set is empty or there are no common elements, returns an empty set. **When to Use:** - Validating that multiple sets have no common elements (check if result is empty) - Finding common elements across multiple sets - Internal validation in framework code **Example:** .. code-block:: python from deep_time_series.util import logical_and_for_set_list sets = [ {'a', 'b', 'c'}, {'b', 'c', 'd'}, {'c', 'd', 'e'}, ] common = logical_and_for_set_list(sets) # {'c'} # Check for duplicates (common use case) if logical_and_for_set_list([set(d1.keys()), set(d2.keys())]): raise ValueError("Duplicate keys found!") **Note:** Typically used internally by the framework for validation purposes. The function computes the intersection sequentially: ``set1 & set2 & set3 & ...``. .. autofunction:: deep_time_series.util.logical_and_for_set_list logical_or_for_set_list ~~~~~~~~~~~~~~~~~~~~~~~ Compute the union of a list of sets. **Purpose:** Finds all unique elements across all sets in a list. Used internally for combining sets. **Parameters:** - ``set_list`` (list[set]): List of sets to compute the union of. Must contain at least one set. **Returns:** - ``set``: A set containing all unique elements from all input sets. **When to Use:** - Combining multiple sets into one - Finding all unique elements across sets - Internal operations in framework code **Example:** .. code-block:: python from deep_time_series.util import logical_or_for_set_list sets = [ {'a', 'b'}, {'b', 'c'}, {'c', 'd'}, ] union = logical_or_for_set_list(sets) # {'a', 'b', 'c', 'd'} # Get all unique keys from multiple dictionaries all_keys = logical_or_for_set_list([set(d.keys()) for d in dict_list]) **Note:** Typically used internally by the framework. The function computes the union sequentially: ``set1 | set2 | set3 | ...``. .. autofunction:: deep_time_series.util.logical_or_for_set_list merge_dicts ~~~~~~~~~~~~ Merge multiple dictionaries into a single dictionary. Raises an error if keys are duplicated. **Purpose:** Combines multiple dictionaries into one, ensuring no key conflicts. Used extensively in the framework to merge encoder outputs with decoder inputs. **Parameters:** - ``dicts`` (list[dict]): List of dictionaries to merge. All dictionaries will be combined into one. - ``ignore_keys`` (set | list[str] | None): Optional set or list of keys to ignore during merging. These keys will be excluded from the result even if they appear in multiple dictionaries. Default is ``None``. **Returns:** - ``dict``: A new dictionary containing all key-value pairs from all input dictionaries (except ignored keys). Maintains insertion order (Python 3.7+). **Raises:** - ``AssertionError``: If any keys overlap between dictionaries (unless they are in ``ignore_keys``). **When to Use:** - Combining model outputs from different stages (e.g., encoder + decoder) - Merging multiple dictionaries without key conflicts - Internal framework operations **Key Features:** - **Duplicate Detection**: Raises an assertion error if any keys overlap - **Key Filtering**: Can ignore specific keys during merging - **Order Preservation**: Maintains insertion order (Python 3.7+) **Example:** .. code-block:: python from deep_time_series.util import merge_dicts dict1 = {'a': 1, 'b': 2} dict2 = {'c': 3, 'd': 4} merged = merge_dicts([dict1, dict2]) # {'a': 1, 'b': 2, 'c': 3, 'd': 4} # With ignore_keys dict1 = {'a': 1, 'b': 2, 'temp': 999} dict2 = {'c': 3} merged = merge_dicts([dict1, dict2], ignore_keys=['temp']) # {'a': 1, 'b': 2, 'c': 3} **Use in ForecastingModule:** The ``ForecastingModule.forward()`` method uses this to merge encoder outputs with inputs for the decoder: .. code-block:: python encoder_outputs = self.encode(inputs) decoder_inputs = merge_dicts([inputs, encoder_outputs]) outputs = self.decode(decoder_inputs) **Important:** - Keys must be unique across all dictionaries (unless in ignore_keys) - Raises AssertionError if duplicates are found - The function creates a new dictionary; original dictionaries are not modified .. autofunction:: deep_time_series.util.merge_dicts merge_data_frames ~~~~~~~~~~~~~~~~~~ Merge multiple pandas DataFrames by concatenating them and adding time_index and time_series_id columns. **Purpose:** Combines multiple time series DataFrames into a single DataFrame while preserving information about which series each row belongs to. This is useful when working with multiple related time series that need to be analyzed together. **Parameters:** - ``dfs`` (list[pd.DataFrame]): List of DataFrames to merge. Each DataFrame represents a separate time series. DataFrames should have compatible column structures (same column names and types). **Returns:** - ``pd.DataFrame``: A single DataFrame containing all rows from all input DataFrames, with two additional columns: - ``time_index``: The original index values from each DataFrame - ``time_series_id``: Integer identifier (0, 1, 2, ...) indicating which DataFrame each row came from **Key Features:** - **Time Index Preservation**: Adds original index as 'time_index' column - **Series Identification**: Adds 'time_series_id' to track source DataFrame - **Deep Copy**: Creates copies to avoid modifying original DataFrames - **Index Reset**: Resets the index of the merged DataFrame (uses default integer index) **When to Use:** - Combining multiple time series for analysis - Preparing data from multiple sources - Creating a unified dataset from separate series - Preprocessing multiple series together with ``ColumnTransformer`` **Example:** .. code-block:: python import pandas as pd import numpy as np from deep_time_series.util import merge_data_frames # Multiple time series from different sensors df1 = pd.DataFrame({ 'temperature': np.sin(np.arange(100)), 'humidity': np.random.rand(100) }) df2 = pd.DataFrame({ 'temperature': np.cos(np.arange(100)), 'humidity': np.random.rand(100) }) df3 = pd.DataFrame({ 'temperature': np.random.randn(100), 'humidity': np.random.rand(100) }) # Merge with tracking merged = merge_data_frames([df1, df2, df3]) # Result has columns: ['temperature', 'humidity', 'time_index', 'time_series_id'] # time_series_id: 0 for df1, 1 for df2, 2 for df3 # time_index: original index values from each DataFrame **Output Format:** The merged DataFrame includes: - All original columns from input DataFrames - ``time_index``: Original index values (preserved from each source DataFrame) - ``time_series_id``: Integer ID (0, 1, 2, ...) indicating source DataFrame **Use Cases:** - Combining data from multiple sensors/locations - Merging training and validation sets for preprocessing - Creating unified datasets for analysis - Preparing data for models that can handle multiple time series .. autofunction:: deep_time_series.util.merge_data_frames Plotting -------- The plotting module provides visualization utilities for time series data. plot_chunks ~~~~~~~~~~~ Visualize chunk specifications as horizontal bars showing the time windows for encoding, decoding, and labels. **Purpose:** Creates a visual representation of chunk specifications, making it easy to understand the temporal structure of your model's input/output windows. This visualization helps debug chunk configurations and understand how data flows through the model. **Parameters:** - ``chunk_specs`` (list[BaseChunkSpec]): List of chunk specifications to visualize. Each chunk will be displayed as a horizontal bar. **Returns:** - ``None``: The function modifies the current matplotlib figure/axes in place. Use ``plt.show()`` or ``plt.savefig()`` to display or save the plot. **When to Use:** - Understanding model architecture - Debugging chunk specifications - Visualizing data windows - Documentation and presentations - Verifying that chunk ranges are correct **Output:** Creates a horizontal bar chart where: - Each bar represents a chunk specification - Bar position (left edge) shows the start of the time range - Bar width shows the window length (end - start) - Labels show the chunk tag - Y-axis position indicates different chunks **Example:** .. code-block:: python import matplotlib.pyplot as plt from deep_time_series.plotting import plot_chunks from deep_time_series.chunk import EncodingChunkSpec, LabelChunkSpec, DecodingChunkSpec import numpy as np # Create chunk specifications chunk_specs = [ EncodingChunkSpec('targets', ['temp'], (0, 10), np.float32), DecodingChunkSpec('nontargets', ['humidity'], (10, 15), np.float32), LabelChunkSpec('targets', ['temp'], (10, 15), np.float32), ] # Visualize plot_chunks(chunk_specs) plt.xlabel('Time Index') plt.title('Chunk Specifications') plt.show() **Integration with TimeSeriesDataset:** The ``TimeSeriesDataset`` class provides a convenience method: .. code-block:: python from deep_time_series.dataset import TimeSeriesDataset dataset = TimeSeriesDataset(data_frames=data, chunk_specs=chunk_specs) dataset.plot_chunks() # Visualize the chunks used by this dataset plt.show() **Visualization Details:** - Uses matplotlib's ``barh()`` for horizontal bars - Alpha transparency (0.8) for overlapping bars - Annotations show chunk tags at the left edge of each bar - Y-axis shows different chunks (numbered from 1) - X-axis shows time indices **Note:** - Requires matplotlib to be installed - The function modifies the current matplotlib figure/axes - You may want to add labels and title using ``plt.xlabel()``, ``plt.ylabel()``, ``plt.title()`` - Call ``plt.show()`` or ``plt.savefig()`` after calling this function to display or save the plot .. automodule:: deep_time_series.plotting :members: :undoc-members: .. autofunction:: deep_time_series.plotting.plot_chunks