Data Transformation#
The transform module provides data preprocessing utilities for time series data. The main class is ColumnTransformer, which allows applying different transformers to different columns of a DataFrame.
ColumnTransformer#
A transformer that applies sklearn-style transformers to specific columns of a pandas DataFrame. It supports both dictionary and tuple-based transformer specifications.
Purpose:
ColumnTransformer provides a convenient way to apply different preprocessing transformers to different columns of a DataFrame. It follows the sklearn API pattern, making it easy to integrate with existing sklearn workflows.
Key Features:
sklearn-Compatible Interface: Follows the
fit(),transform(),fit_transform(), andinverse_transform()patternColumn-Specific Transformers: Apply different transformers to different columns
Multiple DataFrame Support: Can fit and transform single or multiple DataFrames
Deep Copy Safety: Each column gets its own transformer instance (no shared state)
Initialization Parameters:
transformer_dict(dict[str, Transformer] | None): Dictionary mapping column names to transformer instances. Each column name maps to a transformer that will be applied to that column. Eithertransformer_dictortransformer_tuplesmust be provided, but not both.transformer_tuples(list[tuple[Transformer, list[str]]] | None): List of tuples, each containing a transformer instance and a list of column names. The transformer will be deep-copied for each column. This is more convenient when applying the same transformer to multiple columns.
Methods:
``fit(data_frames)``: Fits all transformers on the provided data. Computes statistics (e.g., mean, std for StandardScaler) from the data.
Parameters:
data_frames(pd.DataFrame | list[pd.DataFrame]): Training data. Can be a single DataFrame or a list of DataFrames. If multiple DataFrames are provided, they are concatenated before fitting.
Returns:
self: Returns self for method chaining.
``transform(data_frames)``: Applies the fitted transformers to the data. Only transforms columns that were specified during initialization.
Parameters:
data_frames(pd.DataFrame | list[pd.DataFrame]): Data to transform. Can be a single DataFrame or a list of DataFrames.
Returns:
pd.DataFrame | list[pd.DataFrame]: Transformed data with the same structure as input. Columns not specified in transformers remain unchanged.
``fit_transform(data_frames)``: Convenience method that fits and transforms in one step. Equivalent to calling
fit()followed bytransform().Parameters:
data_frames(pd.DataFrame | list[pd.DataFrame]): Data to fit and transform.
Returns:
pd.DataFrame | list[pd.DataFrame]: Transformed data.
``inverse_transform(data_frames)``: Applies the inverse transformation to convert data back to the original scale. Useful for converting model predictions back to the original data scale.
Parameters:
data_frames(pd.DataFrame | list[pd.DataFrame]): Transformed data to invert.
Returns:
pd.DataFrame | list[pd.DataFrame]: Data in original scale.
Internal Methods:
``_apply_to_single_feature(series, func)``: Internal helper method that applies a transformer function to a single pandas Series. Handles reshaping and type conversion.
``_get_valid_names(names)``: Internal helper method that returns the intersection of column names in the transformer dictionary and the provided names. Ensures only valid columns are transformed.
Typical Use Cases:
Normalization: Scale features to have zero mean and unit variance (
StandardScaler)Min-Max Scaling: Scale features to a specific range (
MinMaxScaler)Robust Scaling: Scale using median and IQR (
RobustScaler)Custom Transformers: Apply any sklearn-compatible transformer
Two Initialization Methods:
Dictionary Method: Map column names directly to transformers
Tuple Method: Apply the same transformer to multiple columns (more convenient)
Example - Dictionary Method:
from deep_time_series.transform import ColumnTransformer
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import pandas as pd
import numpy as np
# Create sample data
data = pd.DataFrame({
'temperature': np.random.randn(100) * 10 + 20,
'humidity': np.random.rand(100) * 100,
'pressure': np.random.randn(100) * 5 + 1013,
})
# Dictionary method: map each column to a transformer
transformer = ColumnTransformer(
transformer_dict={
'temperature': StandardScaler(),
'humidity': MinMaxScaler(),
'pressure': StandardScaler(),
}
)
# Fit and transform
data_transformed = transformer.fit_transform(data)
Example - Tuple Method (Recommended):
from deep_time_series.transform import ColumnTransformer
from sklearn.preprocessing import StandardScaler
import pandas as pd
# Tuple method: apply same transformer to multiple columns
transformer = ColumnTransformer(
transformer_tuples=[
(StandardScaler(), ['temperature', 'pressure']), # Scale these
# Other columns remain unchanged
]
)
data_transformed = transformer.fit_transform(data)
Important Notes:
Deep Copying: When using
transformer_tuples, each column gets a deep copy of the transformer, so they don’t share stateColumn Validation: Only columns present in both the transformer dict and the DataFrame will be transformed
Preservation: Columns not specified in the transformer will remain unchanged in the output
Workflow:
Fit: Compute statistics (mean, std, etc.) from training data
Transform: Apply the learned transformation to new data
Inverse Transform: Convert transformed data back to original scale (useful for predictions)
Example - Full Workflow:
from deep_time_series.transform import ColumnTransformer
from sklearn.preprocessing import StandardScaler
import pandas as pd
# Training data
train_data = pd.DataFrame({'temperature': np.random.randn(100) * 10 + 20})
# Create and fit transformer
transformer = ColumnTransformer(
transformer_tuples=[(StandardScaler(), ['temperature'])]
)
train_transformed = transformer.fit_transform(train_data)
# Test data (use same transformer, don't refit)
test_data = pd.DataFrame({'temperature': np.random.randn(50) * 10 + 20})
test_transformed = transformer.transform(test_data)
# Inverse transform predictions back to original scale
predictions_transformed = ... # Model predictions in transformed space
predictions_original = transformer.inverse_transform(predictions_transformed)
Multiple DataFrames:
The transformer can handle lists of DataFrames, which is useful when you have multiple time series:
data1 = pd.DataFrame({'temperature': np.random.randn(100)})
data2 = pd.DataFrame({'temperature': np.random.randn(100)})
transformer = ColumnTransformer(
transformer_tuples=[(StandardScaler(), ['temperature'])]
)
# Fit on all data
transformer.fit([data1, data2])
# Transform all data
transformed_list = transformer.transform([data1, data2])
- class ColumnTransformer(transformer_dict=None, transformer_tuples=None)[source]#
Bases:
object- fit(data_frames)[source]#
- Parameters:
data_frames (DataFrame | list[pandas.core.frame.DataFrame]) –
- Return type:
None
- fit_transform(data_frames)[source]#
- Parameters:
data_frames (DataFrame | list[pandas.core.frame.DataFrame]) –
- Return type:
DataFrame | list[pandas.core.frame.DataFrame]
- class ColumnTransformer(transformer_dict=None, transformer_tuples=None)[source]#
Bases:
object- fit(data_frames)[source]#
- Parameters:
data_frames (DataFrame | list[pandas.core.frame.DataFrame]) –
- Return type:
None
- fit_transform(data_frames)[source]#
- Parameters:
data_frames (DataFrame | list[pandas.core.frame.DataFrame]) –
- Return type:
DataFrame | list[pandas.core.frame.DataFrame]