Data Transformation ==================== The transform module provides data preprocessing utilities for time series data. The main class is :class:`ColumnTransformer`, which allows applying different transformers to different columns of a DataFrame. ColumnTransformer ------------------ A transformer that applies sklearn-style transformers to specific columns of a pandas DataFrame. It supports both dictionary and tuple-based transformer specifications. **Purpose:** ``ColumnTransformer`` provides a convenient way to apply different preprocessing transformers to different columns of a DataFrame. It follows the sklearn API pattern, making it easy to integrate with existing sklearn workflows. **Key Features:** - **sklearn-Compatible Interface**: Follows the ``fit()``, ``transform()``, ``fit_transform()``, and ``inverse_transform()`` pattern - **Column-Specific Transformers**: Apply different transformers to different columns - **Multiple DataFrame Support**: Can fit and transform single or multiple DataFrames - **Deep Copy Safety**: Each column gets its own transformer instance (no shared state) **Initialization Parameters:** - ``transformer_dict`` (dict[str, Transformer] | None): Dictionary mapping column names to transformer instances. Each column name maps to a transformer that will be applied to that column. Either ``transformer_dict`` or ``transformer_tuples`` must be provided, but not both. - ``transformer_tuples`` (list[tuple[Transformer, list[str]]] | None): List of tuples, each containing a transformer instance and a list of column names. The transformer will be deep-copied for each column. This is more convenient when applying the same transformer to multiple columns. **Methods:** - **``fit(data_frames)``**: Fits all transformers on the provided data. Computes statistics (e.g., mean, std for StandardScaler) from the data. **Parameters:** - ``data_frames`` (pd.DataFrame | list[pd.DataFrame]): Training data. Can be a single DataFrame or a list of DataFrames. If multiple DataFrames are provided, they are concatenated before fitting. **Returns:** - ``self``: Returns self for method chaining. - **``transform(data_frames)``**: Applies the fitted transformers to the data. Only transforms columns that were specified during initialization. **Parameters:** - ``data_frames`` (pd.DataFrame | list[pd.DataFrame]): Data to transform. Can be a single DataFrame or a list of DataFrames. **Returns:** - ``pd.DataFrame | list[pd.DataFrame]``: Transformed data with the same structure as input. Columns not specified in transformers remain unchanged. - **``fit_transform(data_frames)``**: Convenience method that fits and transforms in one step. Equivalent to calling ``fit()`` followed by ``transform()``. **Parameters:** - ``data_frames`` (pd.DataFrame | list[pd.DataFrame]): Data to fit and transform. **Returns:** - ``pd.DataFrame | list[pd.DataFrame]``: Transformed data. - **``inverse_transform(data_frames)``**: Applies the inverse transformation to convert data back to the original scale. Useful for converting model predictions back to the original data scale. **Parameters:** - ``data_frames`` (pd.DataFrame | list[pd.DataFrame]): Transformed data to invert. **Returns:** - ``pd.DataFrame | list[pd.DataFrame]``: Data in original scale. **Internal Methods:** - **``_apply_to_single_feature(series, func)``**: Internal helper method that applies a transformer function to a single pandas Series. Handles reshaping and type conversion. - **``_get_valid_names(names)``**: Internal helper method that returns the intersection of column names in the transformer dictionary and the provided names. Ensures only valid columns are transformed. **Typical Use Cases:** - **Normalization**: Scale features to have zero mean and unit variance (``StandardScaler``) - **Min-Max Scaling**: Scale features to a specific range (``MinMaxScaler``) - **Robust Scaling**: Scale using median and IQR (``RobustScaler``) - **Custom Transformers**: Apply any sklearn-compatible transformer **Two Initialization Methods:** 1. **Dictionary Method**: Map column names directly to transformers 2. **Tuple Method**: Apply the same transformer to multiple columns (more convenient) **Example - Dictionary Method:** .. code-block:: python from deep_time_series.transform import ColumnTransformer from sklearn.preprocessing import StandardScaler, MinMaxScaler import pandas as pd import numpy as np # Create sample data data = pd.DataFrame({ 'temperature': np.random.randn(100) * 10 + 20, 'humidity': np.random.rand(100) * 100, 'pressure': np.random.randn(100) * 5 + 1013, }) # Dictionary method: map each column to a transformer transformer = ColumnTransformer( transformer_dict={ 'temperature': StandardScaler(), 'humidity': MinMaxScaler(), 'pressure': StandardScaler(), } ) # Fit and transform data_transformed = transformer.fit_transform(data) **Example - Tuple Method (Recommended):** .. code-block:: python from deep_time_series.transform import ColumnTransformer from sklearn.preprocessing import StandardScaler import pandas as pd # Tuple method: apply same transformer to multiple columns transformer = ColumnTransformer( transformer_tuples=[ (StandardScaler(), ['temperature', 'pressure']), # Scale these # Other columns remain unchanged ] ) data_transformed = transformer.fit_transform(data) **Important Notes:** - **Deep Copying**: When using ``transformer_tuples``, each column gets a deep copy of the transformer, so they don't share state - **Column Validation**: Only columns present in both the transformer dict and the DataFrame will be transformed - **Preservation**: Columns not specified in the transformer will remain unchanged in the output **Workflow:** 1. **Fit**: Compute statistics (mean, std, etc.) from training data 2. **Transform**: Apply the learned transformation to new data 3. **Inverse Transform**: Convert transformed data back to original scale (useful for predictions) **Example - Full Workflow:** .. code-block:: python from deep_time_series.transform import ColumnTransformer from sklearn.preprocessing import StandardScaler import pandas as pd # Training data train_data = pd.DataFrame({'temperature': np.random.randn(100) * 10 + 20}) # Create and fit transformer transformer = ColumnTransformer( transformer_tuples=[(StandardScaler(), ['temperature'])] ) train_transformed = transformer.fit_transform(train_data) # Test data (use same transformer, don't refit) test_data = pd.DataFrame({'temperature': np.random.randn(50) * 10 + 20}) test_transformed = transformer.transform(test_data) # Inverse transform predictions back to original scale predictions_transformed = ... # Model predictions in transformed space predictions_original = transformer.inverse_transform(predictions_transformed) **Multiple DataFrames:** The transformer can handle lists of DataFrames, which is useful when you have multiple time series: .. code-block:: python data1 = pd.DataFrame({'temperature': np.random.randn(100)}) data2 = pd.DataFrame({'temperature': np.random.randn(100)}) transformer = ColumnTransformer( transformer_tuples=[(StandardScaler(), ['temperature'])] ) # Fit on all data transformer.fit([data1, data2]) # Transform all data transformed_list = transformer.transform([data1, data2]) .. automodule:: deep_time_series.transform :members: :undoc-members: :show-inheritance: .. autoclass:: deep_time_series.transform.ColumnTransformer :members: :undoc-members: :show-inheritance: