Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request/Enhancement: Ability to limit the number of tasks that can run concurrently in the 'local' batch_system (or per batch system) #170

Open
MarcelHoh opened this issue May 28, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@MarcelHoh
Copy link
Contributor

Hi,

I often find myself running event generation and ntuple production tasks in the hundreds to thousands on lsf at kekcc followed by a few brief tasks which must be run locally rather than on the batch system due to memory issues. For these tasks I specify batch_system='local'.

In order for the tasks that are processed on the batch system to be submitted I set workers=1000. Once it comes time however for the local jobs to run this means that b2luigi tries to start lots of tasks simultaneously and runs into many Resource Unavailable errors. I would therefore like to add a feature to specify a separate number of workers for the 'local' batch_system. If you agree this would be useful I can start to work on this.

@FelixMetzner
Copy link
Contributor

Hi Marcel,
you can use the resources feature already provided by luigi for this purpose.

Cheers,

Felix

@MarcelHoh
Copy link
Contributor Author

Hi Felix,
thank you very much! I was not aware of this. Excuse my ignorance here, but do you know the correct way to specify the resource limit for just the 'local' batch system? As far as I can tell the batch_system' specific settings are all handled by the b2luigi settings manager and at least do not explicitly check this configuration file.
Cheers,
Marcel

@FelixMetzner
Copy link
Contributor

FelixMetzner commented May 29, 2022

I think a luigi.cfg will still be considered, if the file is located in the directory from which you start the local process. On KEKcc the environment from which the job is submitted should be send along and so should this config file. I am not 100 % sure, though, and it will of course depend on your specific setup.

Independent of this, the tasks you were referring to are running locally, so the config file should be used correctly. Also keep in mind, that a resource is only considered, if the task defines it and that you can change this at runtime based on luigi parameters or other information such as the host name, etc..

I would say, you just have to give it a try and see what works for you.

Cheers,

Felix

@meliache meliache added wontfix This will not be worked on enhancement New feature or request and removed wontfix This will not be worked on labels May 29, 2022
@meliache
Copy link
Collaborator

Thanks @FelixMetzner for explaining how to achieve this with luigi. I also never used resources before, but googling a bit shows some examples how to use them, e.g. in the Luigi Patterns documentation. Taken inspiration from them, I think using a property function for the resources could probably make for a dynamic solution which changes the max jobs automatically based on the batch-system of the task:

class A(b2luigi.Task):
    ...
    @property
    def resources(self):
        # If the batch-system is local, use up one local_task resource,
        # otherwise use up one batch_task resource.
        if b2luigi.get_setting("batch_system", task=self) == "local":
            return {"local_tasks": 1}
        return {"batch_tasks": 1}

Then also all tasks inheriting from A will have this dynamic property.
And in the luigi.cfg then just specify available resources for local_tasks and batch_tasks as described in the documentation that Felix linked to.

Not sure whether the about code works, as I said I never tested it, but if it works maybe it would be nice to document this somewhere, as it's a useful feature.

I'm myself guilty of once accidentally starting 800 local tasks, because I wanted to process on htcondor and increased the workers after a test-run but forgot to change the batch-system. I admit it would be comfortable to have somewhere a setting which sets this for all batch-systems by default without having to give resource properties to each task, possibly with a sensible default maximum number of local tasks. But I don't want to add much code and complexity for that when there is something users can do themselves.

@MarcelHoh
Copy link
Contributor Author

MarcelHoh commented May 30, 2022

Thank you both, I have now tested the resources feature and this works nicely for what I need. Perhaps this could be added to the documentation on setting the 'local' batch system.
I agree also that it would be nice to have this as a global setting across all tasks but it is simple to add the resources property to a task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants