Release Sparse matrix support, early stopping, and checkpointing · EpistasisLab/tpot

TPOT now supports sparse matrices with a new built-in TPOT configurations, "TPOT sparse". We are using a custom OneHotEncoder implementation that supports missing values and continuous features.
We have added an "early stopping" option for stopping the optimization process if no improvement is made within a set number of generations. Look up the early_stop parameter to access this functionality.
TPOT now reduces the number of duplicated pipelines between generations, which saves you time during the optimization process.
TPOT now supports custom scoring functions via the command-line mode.
We have added a new optional argument, periodic_checkpoint_folder, that allows TPOT to periodically save the best pipeline so far to a local folder during optimization process.
TPOT no longer uses sklearn.externals.joblib when n_jobs=1 to avoid the potential freezing issue that scikit-learn suffers from.
We have added pandas as a dependency to read input datasets instead of numpy.recfromcsv. NumPy's recfromcsv function is unable to parse datasets with complex data types.
Fixed a bug that DEFAULT in the parameter(s) of nested estimator raises KeyError when exporting pipelines.
Fixed a bug related to setting random_state in nested estimators. The issue would happen with pipeline with SelectFromModel (ExtraTreesClassifier as nested estimator) or StackingEstimator if nested estimator has random_state parameter.
Fixed a bug in the missing value imputation function in TPOT to impute along columns instead rows.
Refined input checking for sparse matrices in TPOT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse matrix support, early stopping, and checkpointing