Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Moving closer to Dask #87

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

[WIP] Moving closer to Dask #87

wants to merge 7 commits into from

Conversation

caspervdw
Copy link
Collaborator

I am trying to get some ideas written down to move this project closer to dask. The current state of the document is a comparison of relevant parts of dask and dask-geomodeling.

@caspervdw caspervdw requested a review from daanvaningen June 8, 2021 08:54
rfc/0001-dask.md Outdated Show resolved Hide resolved
Copy link

@daanvaningen daanvaningen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting! I don't feel like I have a good feel for the implications of implementing this but I think you pointed out some good reasons to move closer to Dask.
I think your concern is also mine: what is the effect of these optimizations and parallel performance when our chunks are small and data request sizes limited by our web-application.

@caspervdw
Copy link
Collaborator Author

Very interesting! I don't feel like I have a good feel for the implications of implementing this but I think you pointed out some good reasons to move closer to Dask.

I find this very hard to wrap my head around too. I am not sure if it is worth the effort, or if we should wait for the dask improvements to be released. In the meantime, we could focus on the infrastructure part rather than rewriting dask-geomodeling.

I think your concern is also mine: what is the effect of these optimizations and parallel performance when our chunks are small and data request sizes limited by our web-application.

For web-application, we effectively do not parallelize across multiple chunks. This is not necessary as the chunks are so small. We may parallelize operations if the graph allows it (e.g. reading the same chunk from multiple sources concurrently instead of after each other).
I am worried that the graph optimization itself (the "overhead") will be large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants