-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define threads for Nextclade rules #459
Comments
Good idea but I don't think we need to set these threads. All they do, in my experience, is limit parallelism. Have you checked that there's a need for this? I'm pretty sure things have always been running in parallel just fine. Nextclade uses all cores by default, unless you tell it through |
Hmm, I'm checking with a small example below. Note that rule all:
input:
b = "b.txt",
c = "c.txt"
rule a:
output: touch("a.txt")
shell:
"""
echo rule a nproc "$(nproc)"
"""
rule b:
input: "a.txt"
output: touch("b.txt")
threads: workflow.cores * 0.5
shell:
"""
echo rule b nproc "$(nproc)"
"""
rule c:
input: "a.txt"
output: touch("c.txt")
threads: 2
shell:
"""
echo rule c nproc "$(nproc)"
""" Running with
|
I still don't think this is necessary. I've never used nproc and it seems to report something that nextclade doesn't care about - at least on macOS. See how these three rules run all in parallel, even though one doesn't mention threads at all. They all run the same time, using as many jobs as there are cores (unless limited by Though it does indeed seem to set some env variable that makes Snakefile:
Output:
Note that the slowest is the one that gets 4 threads but limits nextclade jobs via This is slower than not defining threads and not passing jobs for nextclade. So the addition of jobs doesn't make anything faster, just adds lines of codes. |
I outputted
So it looks like nproc reports what one of these variables says, but these variables are ignored by Nextclade and hence |
Hmm, I'm getting confused. Isn't there two levels of parallelism going on?
As you stated, [1] is limited by threads. However I thought [2] relies on Snakemake reserving the number of cores defined by threads for the single rule? Or am I completely misunderstanding how Snakemake works under the hood? |
Chatted with @tsibley about this in our 1:1 today and I am misunderstanding. Snakemake's threads are just "guidelines" where we use it, so it does need to be passed to I am still curious whether limiting the threads will be faster than letting the two nextclade jobs compete for resources, so I'll update the workflow to pass in threads to Nextclade tomorrow and see how that affects the runtime. |
I took a quick look at run times for this week after adding threads. The runtime when using the Nextclade cache is fast enough that there's no noticeable difference. However, there is a difference in a full Nextclade run. Comparing just the
|
With plans to ignore cache with new versions, I think we will do full runs more often and so it makes sense to keep the defined threads. |
Prompted by #456 (comment)
From my reading of Snakemake docs on threads:
Makes me think we need to define threads of the Nextclade Snakemake rules in order for Nextclade to parallelize.
Proposing that we split the workflow cores between the two Nextclade rules:
The text was updated successfully, but these errors were encountered: