Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace dask_ml.wrappers.ParallelPostFit with custom ParallelPostFit class #832

Merged
merged 17 commits into from
Oct 24, 2022

Conversation

sarahyurick
Copy link
Collaborator

As part of our initiative to move away from Dask-ML, I've migrated some code from Dask-ML into Dask-SQL to support ParallelPostFit.

@codecov-commenter
Copy link

codecov-commenter commented Sep 30, 2022

Codecov Report

Merging #832 (ef087db) into main (d9fc2c1) will decrease coverage by 2.10%.
The diff coverage is 30.16%.

@@            Coverage Diff             @@
##             main     #832      +/-   ##
==========================================
- Coverage   77.44%   75.33%   -2.11%     
==========================================
  Files          71       72       +1     
  Lines        3600     3779     +179     
  Branches      634      674      +40     
==========================================
+ Hits         2788     2847      +59     
- Misses        685      795     +110     
- Partials      127      137      +10     
Impacted Files Coverage Δ
dask_sql/physical/rel/custom/wrappers.py 28.07% <28.07%> (ø)
dask_sql/physical/rel/custom/create_model.py 88.70% <66.66%> (-2.82%) ⬇️
dask_sql/physical/rel/custom/create_experiment.py 96.15% <100.00%> (-0.10%) ⬇️
dask_sql/_version.py 35.31% <0.00%> (+2.74%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ayushdg ayushdg requested a review from VibhuJawa October 4, 2022 01:55
@sarahyurick
Copy link
Collaborator Author

Let me know if you think any tests from Dask-ML's tests/test_parallel_post_fit.py or from dask/dask-ml#912 should be added.

dask_sql/physical/rel/custom/wrappers.py Outdated Show resolved Hide resolved
@@ -43,7 +43,7 @@ class CreateModelPlugin(BaseRelPlugin):
unsupervised algorithms). This means, you typically
want to set this parameter.
* wrap_predict: Boolean flag, whether to wrap the selected
model with a :class:`dask_ml.wrappers.ParallelPostFit`.
model with a :class:`dask_sql.physical.rel.custom.wrappers.ParallelPostFit`.
Have a look into the
[dask-ml docu](https://ml.dask.org/meta-estimators.html#parallel-prediction-and-transformation)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this documentation link too and your own documentation in dask_sql. I would like us to move away from wrap_fit functionality to auto-detecting if its a sklearn/single-GPU cuML/xgboost model vs a dask model.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be resolved by #855

tests/integration/test_model.py Show resolved Hide resolved
Copy link
Collaborator

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small Change, looks good other-wise

Copy link
Collaborator

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesed Small change, Looks good other-wise.

dask_sql/physical/rel/custom/wrappers.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@ayushdg ayushdg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @sarahyurick. Changes generally look good to me.

In addition to some of the integration tests added here which check the interaction of sql with our wrappers, it might make sense (in a followup) to also add unit tests that independently test the wrappers functionality outside of sql and make sure things are working as expected.

We can probably adapt the tests from https://github.com/dask/dask-ml/blob/94e52613fc87abd4ef8c97510539ded1303ae6f4/tests/test_parallel_post_fit.py and add it to the tests/unit section

dask_sql/physical/rel/custom/wrappers.py Outdated Show resolved Hide resolved
@ayushdg ayushdg merged commit feecf41 into dask-contrib:main Oct 24, 2022
@sarahyurick sarahyurick deleted the parallelpostfit branch May 26, 2023 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants