-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace dask_ml.wrappers.Incremental
with custom Incremental
class
#855
Conversation
dask_ml.wrappers.Incremental
with custom Incremental
classdask_ml.wrappers.Incremental
with custom Incremental
class
Codecov Report
@@ Coverage Diff @@
## main #855 +/- ##
==========================================
+ Coverage 75.20% 75.52% +0.32%
==========================================
Files 72 73 +1
Lines 3779 3972 +193
Branches 674 710 +36
==========================================
+ Hits 2842 3000 +158
- Misses 804 810 +6
- Partials 133 162 +29
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
_assert_eq(l, r, name=attr, **kwargs) | ||
|
||
|
||
def test_parallelpostfit_basic(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the unit tests, I added test_parallelpostfit_basic
(originally test_it_works
), test_predict
, and test_transform
from https://github.com/dask/dask-ml/blob/main/tests/test_parallel_post_fit.py, and test_incremental_basic
from https://github.com/dask/dask-ml/blob/main/tests/test_incremental.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sarahyurick! Code changes look good, just a couple remaining mentions of dask-ml in the affected files that could be removed (assuming the remaining uses will be handled in #886)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have requested small clarifications, other wise implementation looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sarahyurick!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
model with a :class:`dask_sql.physical.rel.custom.wrappers.ParallelPostFit`. | ||
Have a look into the | ||
[dask-ml docu](https://ml.dask.org/meta-estimators.html#parallel-prediction-and-transformation) | ||
to learn more about it. Defaults to false. Typically you set | ||
it to true for sklearn models if predicting on big data. | ||
Defaults to false. Typically you set it to true for | ||
sklearn models if predicting on big data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR , can you file an issue to clean up the wrap_predict
and wrap_fit
arguments. I think we can get rid of this or do a better default based on the class name of the model.
For sklearn
and single gpu
cuML models, switch this to true else switch this to False.
@@ -174,7 +166,11 @@ def convert(self, rel: "LogicalPlan", context: "dask_sql.Context") -> DataContai | |||
|
|||
search = ExperimentClass(model, {**parameters}, **experiment_kwargs) | |||
logger.info(tune_fit_kwargs) | |||
search.fit(X, y, **tune_fit_kwargs) | |||
search.fit( | |||
X.to_dask_array(lengths=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could experimentClass be a gpu based model or is it limited to cpu based ones only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, since I think every example I've seen with experiment_class
has been with a CPU dask_ml
model... I can try to get a better idea of the scope when the pytests in #886 are updated with other non-dask_ml
models. We can see about adding GPU tests there too.
Closes #839.
In addition to #832, we want to create a custom implementation for Dask-ML's
Incremental
class as well.So as not to create any merge conflicts, I've only added a single file relating to the scorers used in Dask-ML's implementation. After #832 I will add the remaining functionality and changes needed in
dask_sql/physical/rel/custom/create_model.py
,dask_sql/physical/rel/custom/wrappers.py
(created in #832), anddocs/source/sql/ml.rst
.@VibhuJawa