Replace `dask_ml.wrappers.Incremental` with custom `Incremental` class #855

sarahyurick · 2022-10-11T21:46:11Z

Closes #839.

In addition to #832, we want to create a custom implementation for Dask-ML's Incremental class as well.

So as not to create any merge conflicts, I've only added a single file relating to the scorers used in Dask-ML's implementation. After #832 I will add the remaining functionality and changes needed in dask_sql/physical/rel/custom/create_model.py, dask_sql/physical/rel/custom/wrappers.py (created in #832), and docs/source/sql/ml.rst.

@VibhuJawa

codecov-commenter · 2022-10-26T21:12:41Z

Codecov Report

Merging #855 (aadc04d) into main (feecf41) will increase coverage by 0.32%.
The diff coverage is 52.91%.

@@            Coverage Diff             @@
##             main     #855      +/-   ##
==========================================
+ Coverage   75.20%   75.52%   +0.32%     
==========================================
  Files          72       73       +1     
  Lines        3779     3972     +193     
  Branches      674      710      +36     
==========================================
+ Hits         2842     3000     +158     
- Misses        804      810       +6     
- Partials      133      162      +29

Impacted Files	Coverage Δ
dask_sql/physical/rel/custom/metrics.py	`25.00% <25.00%> (ø)`
dask_sql/physical/rel/custom/wrappers.py	`64.07% <76.76%> (+36.00%)`	⬆️
dask_sql/physical/rel/custom/create_experiment.py	`96.15% <100.00%> (ø)`
dask_sql/physical/rel/custom/create_model.py	`88.52% <100.00%> (-0.19%)`	⬇️
dask_sql/physical/rex/core/call.py	`81.33% <0.00%> (+0.29%)`	⬆️
dask_sql/_version.py	`35.31% <0.00%> (+1.41%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

sarahyurick · 2022-10-26T22:37:42Z

tests/unit/test_ml_wrappers.py

+        _assert_eq(l, r, name=attr, **kwargs)
+
+
+def test_parallelpostfit_basic():


For the unit tests, I added test_parallelpostfit_basic (originally test_it_works), test_predict, and test_transform from https://github.com/dask/dask-ml/blob/main/tests/test_parallel_post_fit.py, and test_incremental_basic from https://github.com/dask/dask-ml/blob/main/tests/test_incremental.py

charlesbluca

Thanks @sarahyurick! Code changes look good, just a couple remaining mentions of dask-ml in the affected files that could be removed (assuming the remaining uses will be handled in #886)

dask_sql/physical/rel/custom/create_experiment.py

docs/source/sql/ml.rst

VibhuJawa

Have requested small clarifications, other wise implementation looks good.

dask_sql/physical/rel/custom/create_experiment.py

docs/source/sql/ml.rst

charlesbluca

Thanks @sarahyurick!

VibhuJawa

LGTM

VibhuJawa · 2022-11-07T20:28:32Z

dask_sql/physical/rel/custom/create_model.py

      model with a :class:`dask_sql.physical.rel.custom.wrappers.ParallelPostFit`.
-      Have a look into the
-      [dask-ml docu](https://ml.dask.org/meta-estimators.html#parallel-prediction-and-transformation)
-      to learn more about it. Defaults to false. Typically you set
-      it to true for sklearn models if predicting on big data.
+      Defaults to false. Typically you set it to true for
+      sklearn models if predicting on big data.


Unrelated to this PR , can you file an issue to clean up the wrap_predict and wrap_fit arguments. I think we can get rid of this or do a better default based on the class name of the model.

For sklearn and single gpu cuML models, switch this to true else switch this to False.

ayushdg · 2022-11-14T21:37:43Z

dask_sql/physical/rel/custom/create_experiment.py

@@ -174,7 +166,11 @@ def convert(self, rel: "LogicalPlan", context: "dask_sql.Context") -> DataContai

            search = ExperimentClass(model, {**parameters}, **experiment_kwargs)
            logger.info(tune_fit_kwargs)
-            search.fit(X, y, **tune_fit_kwargs)
+            search.fit(
+                X.to_dask_array(lengths=True),


Could experimentClass be a gpu based model or is it limited to cpu based ones only?

I'm not sure, since I think every example I've seen with experiment_class has been with a CPU dask_ml model... I can try to get a better idea of the scope when the pytests in #886 are updated with other non-dask_ml models. We can see about adding GPU tests there too.

Create metrics.py

28771dc

sarahyurick requested review from ayushdg, charlesbluca and galipremsagar as code owners October 11, 2022 21:46

sarahyurick marked this pull request as draft October 11, 2022 21:46

VibhuJawa mentioned this pull request Oct 11, 2022

Replace dask_ml.wrappers.ParallelPostFit with custom ParallelPostFit class #832

Merged

sarahyurick mentioned this pull request Oct 24, 2022

Remove all Dask-ML uses #886

Merged

Merge branch 'main' into incremental

4734d8a

sarahyurick changed the title ~~[BLOCKED] Replace dask_ml.wrappers.Incremental with custom Incremental class~~ Replace dask_ml.wrappers.Incremental with custom Incremental class Oct 24, 2022

sarahyurick and others added 6 commits October 24, 2022 13:28

add incremental functionality

dc49752

lint and some comments

8d593b0

update more comments

8268096

add dask-ml fit function

7c60bd3

style fix

95d6ec6

DASK_2022_01_0

0f91006

sarahyurick added 3 commits October 26, 2022 14:24

add unit tests

09fc4a6

style fix

f0dc935

remove scheduler

8356dd4

sarahyurick marked this pull request as ready for review October 26, 2022 22:21

sarahyurick commented Oct 26, 2022

View reviewed changes

charlesbluca reviewed Nov 2, 2022

View reviewed changes

dask_sql/physical/rel/custom/create_experiment.py Show resolved Hide resolved

docs/source/sql/ml.rst Show resolved Hide resolved

experiment_class comment

8fcb67c

VibhuJawa suggested changes Nov 2, 2022

View reviewed changes

dask_sql/physical/rel/custom/create_experiment.py Show resolved Hide resolved

docs/source/sql/ml.rst Show resolved Hide resolved

sarahyurick added 2 commits November 2, 2022 11:51

apply Vibhu's suggestions

ebe2348

style fix

aadc04d

charlesbluca approved these changes Nov 3, 2022

View reviewed changes

VibhuJawa approved these changes Nov 7, 2022

View reviewed changes

sarahyurick mentioned this pull request Nov 8, 2022

Clean up wrap_predict and wrap_fit flags #909

Closed

ayushdg approved these changes Nov 14, 2022

View reviewed changes

ayushdg merged commit 5440eff into dask-contrib:main Nov 14, 2022

charlesbluca mentioned this pull request Dec 5, 2022

Resolve ML cluster failures #957

Merged

sarahyurick mentioned this pull request Jan 31, 2023

[DOC] Update ML docs to reflect recent changes #1022

Closed

sarahyurick deleted the incremental branch May 26, 2023 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `dask_ml.wrappers.Incremental` with custom `Incremental` class #855

Replace `dask_ml.wrappers.Incremental` with custom `Incremental` class #855

sarahyurick commented Oct 11, 2022

codecov-commenter commented Oct 26, 2022 •

edited

Loading

sarahyurick Oct 26, 2022

charlesbluca left a comment

VibhuJawa left a comment

charlesbluca left a comment

VibhuJawa left a comment

VibhuJawa Nov 7, 2022

ayushdg Nov 14, 2022

sarahyurick Nov 14, 2022

		_assert_eq(l, r, name=attr, **kwargs)


		def test_parallelpostfit_basic():

Replace dask_ml.wrappers.Incremental with custom Incremental class #855

Replace dask_ml.wrappers.Incremental with custom Incremental class #855

Conversation

sarahyurick commented Oct 11, 2022

codecov-commenter commented Oct 26, 2022 • edited Loading

Codecov Report

sarahyurick Oct 26, 2022

Choose a reason for hiding this comment

charlesbluca left a comment

Choose a reason for hiding this comment

VibhuJawa left a comment

Choose a reason for hiding this comment

charlesbluca left a comment

Choose a reason for hiding this comment

VibhuJawa left a comment

Choose a reason for hiding this comment

VibhuJawa Nov 7, 2022

Choose a reason for hiding this comment

ayushdg Nov 14, 2022

Choose a reason for hiding this comment

sarahyurick Nov 14, 2022

Choose a reason for hiding this comment

Replace `dask_ml.wrappers.Incremental` with custom `Incremental` class #855

Replace `dask_ml.wrappers.Incremental` with custom `Incremental` class #855

codecov-commenter commented Oct 26, 2022 •

edited

Loading