Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default_thread_allocations should not use std::thread::hardware_concurrency() #1223

Open
jbd opened this issue Jan 20, 2025 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@jbd
Copy link

jbd commented Jan 20, 2025

Hello,

In the default_thread_allocations function, C++ hardware_concurrency is used to determine the maximum number of threads.

const int max_threads = std::thread::hardware_concurrency();

std::thread::hardware_concurrency() returns, when possible, the underlying hardware capability to run threads, which might not corresponds to the actual number of cores available to the process (through the use of taskset, batch system like slurm, etc...). The consequence is that the program might run in a non optimal way. For example, if I run taskset -c 1 on my 20 cores machines, it will run 20 workers on only one core.

Consider this code run through the slurm scheduler using srun, requesting 4 cores:

#include <iostream>
#include <sched.h>
#include <thread>  
 
int main (void) {
    cpu_set_t cpu_set;
    int cores = std::thread::hardware_concurrency();
    std::cout << "using std::thread::hardware_concurrency: " << cores << "\n";
    sched_getaffinity(0, sizeof(cpu_set), &cpu_set);
    cores = CPU_COUNT(&cpu_set);
    std::cout << "using sched_getaffinity: " << cores << "\n";
    return 0;
}
$ srun -c 4 ./get_cores
srun: job 13694693 queued and waiting for resources
srun: job 13694693 has been allocated resources
using std::thread::hardware_concurrency: 96
using sched_getaffinity: 4

I think a more sane behavior would be to use sched_getaffinity as in the folly library for example (folly/folly/system/HardwareConcurrency.cpp)

#include <folly/system/HardwareConcurrency.h>

#include <thread>

#include <folly/portability/Sched.h>

namespace folly {

unsigned int hardware_concurrency() noexcept {
#if defined(__linux__) && !defined(__ANDROID__)
  cpu_set_t cpuset;
  if (!sched_getaffinity(0, sizeof(cpuset), &cpuset)) {
    auto count = CPU_COUNT(&cpuset);
    if (count != 0) {
      return count;
    }
  }
#endif

  return std::thread::hardware_concurrency();
}

} // namespace folly

What do you think ?

Thank you for considering this change !

@malton-ont
Copy link
Collaborator

Hi @jbd,

Thanks for the info - this certainly looks like a worthwhile change for users who take more active control over their CPU resources. We'll look into it for a future release.

@malton-ont malton-ont added the enhancement New feature or request label Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants