Preprocessing Functions for Time-Based Windowing, Differencing, and Data Transformation. Xgboost for GridSearch #54
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These functions were implemented as part of the requests from issue #45
time_based_windowing()
This function applies a time-based rolling window to the dataset based on the timestamp index, allowing you to aggregate data over defined time intervals.
Parameters:
remove_constant_difference()
This function computes the cumulative sum of the input DataFrame and removes columns where the differences between consecutive rows are constant. This can be useful for identifying and dropping features that do not provide significant variability.
Parameters:
differencing()
This function applies differencing to the dataset, a technique used to remove trends or seasonal patterns by subtracting the current observation from the previous one.
Parameters:
log_transform()
This function applies a logarithmic transformation to the dataset to reduce variability and help normalize the data. The transformation used is log(1 + x) to handle values close to zero safely.
Parameters:
More methods for _gridsearch.py
A new function has also been added for GridSearch #52 , now using the Xgboost library to be able to generate hyperparameters more quickly by using the GPU
xgboost_best()
This function performs a grid search to identify the best hyperparameters for the XGBRegressor model using a predefined parameter grid. It splits the dataset into training and testing sets, then applies GridSearchCV to tune the model for optimal performance.
Parameters:
Notes:
You can view the new functions in Colab