Sklearn-genetic-opt

scikit-learn models hyperparameters tuning, using evolutionary algorithms.

This is meant to be an alternative from popular methods inside scikit-learn such as Grid Search and Randomized Grid Search.

Sklearn-genetic-opt uses evolutionary algorithms from the DEAP package to choose the set of hyperparameters that optimizes (max or min) the cross-validation scores, it can be used for both regression and classification problems.

Documentation is available here

Main Features:

GASearchCV: Principal class of the package, holds the evolutionary cross validation optimization routine.
Algorithms: Set of different evolutionary algorithms to use as optimization procedure.
Callbacks: Custom evaluation strategies to generate early stopping rules, logging (into TensorBoard, .pkl files, etc) or your custom logic.
Plots: Generate pre-defined plots to understand the optimization process.
MLflow: Build-in integration with mlflow to log all the hyperparameters, cv-scores and the fitted models.

Some demos of the packages capabilities

Visualize the progress of your training:

Real time metrics visualization and comparison across runs:

Sampled distribution of hyperparameters:

Artifacts logging:

Usage:

Install sklearn-genetic-opt

It's advised to install sklearn-genetic using a virtual env, inside the env use:

pip install sklearn-genetic-opt

If you want to get all the features, including plotting and mlflow logging capabilities, install all the extra packages:

pip install sklearn-genetic-opt[all]

The only optional dependency that the last command does not install, it's Tensorflow, it is usually advised to look further which distribution works better for you.

Example

from sklearn_genetic import GASearchCV
from sklearn_genetic.space import Continuous, Categorical, Integer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

data = load_digits()
n_samples = len(data.images)
X = data.images.reshape((n_samples, -1))
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

clf = RandomForestClassifier()

param_grid = {'min_weight_fraction_leaf': Continuous(0.01, 0.5, distribution='log-uniform'),
              'bootstrap': Categorical([True, False]),
              'max_depth': Integer(2, 30),
              'max_leaf_nodes': Integer(2, 35),
              'n_estimators': Integer(100, 300)}

cv = StratifiedKFold(n_splits=3, shuffle=True)

evolved_estimator = GASearchCV(estimator=clf,
                               cv=cv,
                               scoring='accuracy',
                               population_size=10,
                               generations=35,
                               param_grid=param_grid,
                               n_jobs=-1,
                               verbose=True,
                               keep_top_k=4)

# Train and optimize the estimator
evolved_estimator.fit(X_train, y_train)
# Best parameters found
print(evolved_estimator.best_params_)
# Use the model fitted with the best parameters
y_predict_ga = evolved_estimator.predict(X_test)
print(accuracy_score(y_test, y_predict_ga))

# Saved metadata for further analysis
print("Stats achieved in each generation: ", evolved_estimator.history)
print("Best k solutions: ", evolved_estimator.hof)

Changelog

See the changelog for notes on the changes of Sklearn-genetic-opt

Important links

Official source code repo: https://github.com/rodrigo-arenas/Sklearn-genetic-opt/
Download releases: https://pypi.org/project/sklearn-genetic-opt/
Issue tracker: https://github.com/rodrigo-arenas/Sklearn-genetic-opt/issues
Stable documentation: https://sklearn-genetic-opt.readthedocs.io/en/stable/

Source code

You can check the latest development version with the command:

git clone https://github.com/rodrigo-arenas/Sklearn-genetic-opt.git

Contributing

Contributions are more than welcome! There are lots of opportunities on the on going project, so please get in touch if you would like to help out. Also check the Contribution guide

Big thanks to the people who are helping this project!

Testing

After installation, you can launch the test suite from outside the source directory:

pytest sklearn_genetic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rst

README.rst

Sklearn-genetic-opt

Main Features:

Usage:

Example

Changelog

Important links

Source code

Contributing

Testing

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

Sklearn-genetic-opt

Main Features:

Usage:

Example

Changelog

Important links

Source code

Contributing

Testing