Sparse matrix support, early stopping, and checkpointing
-
TPOT now supports sparse matrices with a new built-in TPOT configurations, "TPOT sparse". We are using a custom OneHotEncoder implementation that supports missing values and continuous features.
-
We have added an "early stopping" option for stopping the optimization process if no improvement is made within a set number of generations. Look up the
early_stop
parameter to access this functionality. -
TPOT now reduces the number of duplicated pipelines between generations, which saves you time during the optimization process.
-
TPOT now supports custom scoring functions via the command-line mode.
-
We have added a new optional argument,
periodic_checkpoint_folder
, that allows TPOT to periodically save the best pipeline so far to a local folder during optimization process. -
TPOT no longer uses
sklearn.externals.joblib
whenn_jobs=1
to avoid the potential freezing issue that scikit-learn suffers from. -
We have added
pandas
as a dependency to read input datasets instead ofnumpy.recfromcsv
. NumPy'srecfromcsv
function is unable to parse datasets with complex data types. -
Fixed a bug that
DEFAULT
in the parameter(s) of nested estimator raisesKeyError
when exporting pipelines. -
Fixed a bug related to setting
random_state
in nested estimators. The issue would happen with pipeline withSelectFromModel
(ExtraTreesClassifier
as nested estimator) orStackingEstimator
if nested estimator hasrandom_state
parameter. -
Fixed a bug in the missing value imputation function in TPOT to impute along columns instead rows.
-
Refined input checking for sparse matrices in TPOT.