Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace dask_ml.wrappers.Incremental with custom Incremental class #855

Merged
merged 14 commits into from
Nov 14, 2022

Conversation

sarahyurick
Copy link
Collaborator

Closes #839.

In addition to #832, we want to create a custom implementation for Dask-ML's Incremental class as well.

So as not to create any merge conflicts, I've only added a single file relating to the scorers used in Dask-ML's implementation. After #832 I will add the remaining functionality and changes needed in dask_sql/physical/rel/custom/create_model.py, dask_sql/physical/rel/custom/wrappers.py (created in #832), and docs/source/sql/ml.rst.

@VibhuJawa

@sarahyurick sarahyurick changed the title [BLOCKED] Replace dask_ml.wrappers.Incremental with custom Incremental class Replace dask_ml.wrappers.Incremental with custom Incremental class Oct 24, 2022
@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2022

Codecov Report

Merging #855 (aadc04d) into main (feecf41) will increase coverage by 0.32%.
The diff coverage is 52.91%.

@@            Coverage Diff             @@
##             main     #855      +/-   ##
==========================================
+ Coverage   75.20%   75.52%   +0.32%     
==========================================
  Files          72       73       +1     
  Lines        3779     3972     +193     
  Branches      674      710      +36     
==========================================
+ Hits         2842     3000     +158     
- Misses        804      810       +6     
- Partials      133      162      +29     
Impacted Files Coverage Δ
dask_sql/physical/rel/custom/metrics.py 25.00% <25.00%> (ø)
dask_sql/physical/rel/custom/wrappers.py 64.07% <76.76%> (+36.00%) ⬆️
dask_sql/physical/rel/custom/create_experiment.py 96.15% <100.00%> (ø)
dask_sql/physical/rel/custom/create_model.py 88.52% <100.00%> (-0.19%) ⬇️
dask_sql/physical/rex/core/call.py 81.33% <0.00%> (+0.29%) ⬆️
dask_sql/_version.py 35.31% <0.00%> (+1.41%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sarahyurick sarahyurick marked this pull request as ready for review October 26, 2022 22:21
_assert_eq(l, r, name=attr, **kwargs)


def test_parallelpostfit_basic():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the unit tests, I added test_parallelpostfit_basic (originally test_it_works), test_predict, and test_transform from https://github.com/dask/dask-ml/blob/main/tests/test_parallel_post_fit.py, and test_incremental_basic from https://github.com/dask/dask-ml/blob/main/tests/test_incremental.py

Copy link
Collaborator

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sarahyurick! Code changes look good, just a couple remaining mentions of dask-ml in the affected files that could be removed (assuming the remaining uses will be handled in #886)

docs/source/sql/ml.rst Show resolved Hide resolved
Copy link
Collaborator

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have requested small clarifications, other wise implementation looks good.

docs/source/sql/ml.rst Show resolved Hide resolved
Copy link
Collaborator

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sarahyurick!

Copy link
Collaborator

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 44 to +46
model with a :class:`dask_sql.physical.rel.custom.wrappers.ParallelPostFit`.
Have a look into the
[dask-ml docu](https://ml.dask.org/meta-estimators.html#parallel-prediction-and-transformation)
to learn more about it. Defaults to false. Typically you set
it to true for sklearn models if predicting on big data.
Defaults to false. Typically you set it to true for
sklearn models if predicting on big data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR , can you file an issue to clean up the wrap_predict and wrap_fit arguments. I think we can get rid of this or do a better default based on the class name of the model.

For sklearn and single gpu cuML models, switch this to true else switch this to False.

@@ -174,7 +166,11 @@ def convert(self, rel: "LogicalPlan", context: "dask_sql.Context") -> DataContai

search = ExperimentClass(model, {**parameters}, **experiment_kwargs)
logger.info(tune_fit_kwargs)
search.fit(X, y, **tune_fit_kwargs)
search.fit(
X.to_dask_array(lengths=True),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could experimentClass be a gpu based model or is it limited to cpu based ones only?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, since I think every example I've seen with experiment_class has been with a CPU dask_ml model... I can try to get a better idea of the scope when the pytests in #886 are updated with other non-dask_ml models. We can see about adding GPU tests there too.

@ayushdg ayushdg merged commit 5440eff into dask-contrib:main Nov 14, 2022
@sarahyurick sarahyurick deleted the incremental branch May 26, 2023 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DF] Replace dask_ml.wrappers.Incremental with custom Incremental class
5 participants