[feature] Access to ParallelFor values and set_paralellism per Op #7454
Labels
area/backend
area/sdk
kind/feature
lifecycle/stale
The issue / pull request is stale, any activities remove this label.
Feature Area
/area backend
/area sdk
What feature would you like to see?
I stumbled upon a case where two features could be useful when building dynamically a pipeline that is controlled by some outside config.
What is the use case or pain point?
Let's say we have a number of datasets and number of models we want to train. Not all models should be run on all datasets, so we use a config to specify pipeline content. (Examples in the code). Moreover, we would like that each run-per-dataset is parallel to each other and some task may be CPU/RAM heavy so we would like to limit the parallelism per Op-type.
This i a case when we want to run a pipeline from github CI/CD to test the newly pushed code on a set of dataset and models to ensure it's validity and performance.
Is there a workaround currently?
Semi-workaround is presented in the example as using config in a function that returns pipeline-function and iterating over the config without using
ParallelFor
, but this seems problematic in limiting parallelism per Op.For the problem of paralellism I saw a related issue #4089, but
also I know that Argo allows setting parallelism per Task/Step, so having that in Kubeflow would be nice.
Also using separate pipelines per dateset is somewhat a working option.
Semi-working solution
Example of my thought process in solving the problem – changing
ParallelFor
Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.
The text was updated successfully, but these errors were encountered: