Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default threads number value should use sched_getaffinity #925

Open
jbd opened this issue Jan 6, 2025 · 3 comments
Open

Default threads number value should use sched_getaffinity #925

jbd opened this issue Jan 6, 2025 · 3 comments

Comments

@jbd
Copy link

jbd commented Jan 6, 2025

Hello,

default thread value are computed from Parameters::setDefaults

void Parameters::setDefaults() {
If you run mmseqs on a 96 cores machines with taskset -c 1 or a slurm allocation with a single core requested, mmseqs will use 96 threads on a single core which is unfortunate.

[...]
    threads = 1;
    compressed = WRITER_ASCII_MODE;
#ifdef OPENMP
    char * threadEnv = getenv("MMSEQS_NUM_THREADS");
    if (threadEnv != NULL) {
        threads = (int) Util::fast_atoi<unsigned int>(threadEnv);
    } else {
        #ifdef _SC_NPROCESSORS_ONLN
            threads = sysconf(_SC_NPROCESSORS_ONLN);
        #endif
        if(threads <= 1){
            threads = Util::omp_thread_count();
        }
    }

#endif

If you run mmseqs on a 96 cores machines with taskset -c 1 or a slurm allocation with a single core requested, mmseqs will use 96 threads on a single core which is unfortunate.

Here is a reproducer:

$ cat ccores.c 
#define _GNU_SOURCE  // required to use sched.h
#include <sched.h>
#include <stdio.h>
#include <sys/sysinfo.h>
#include <unistd.h>  // required to use sysconf
 
int main(void) {
    cpu_set_t cs;
    CPU_ZERO(&cs);
    sched_getaffinity(0, sizeof(cs), &cs);
 
    printf("using get_nprocs: %d\n", get_nprocs());
    printf("using sysconf(_SC_NPROCESSORS_ONLN): %d\n", sysconf(_SC_NPROCESSORS_ONLN));

    int n = 0;
    #pragma omp parallel reduction(+:n)
    n += 1;
    printf("using openmp: %d\n", n);
 
    int count = 0;
    for (int i = 0; i < sizeof(cs); i++) {
        if (CPU_ISSET(i, &cs)) { count++; }
    }
    printf("using sched_getaffinity: %d\n", count);


    return 0;
}
$ CFLAGS=-fopenmp make ccores
cc -fopenmp    ccores.c   -o ccores

From a machine with a single 64 cores cpu with HT activated:

$ ./ccores 
using get_nprocs: 128
using sysconf(_SC_NPROCESSORS_ONLN): 128
using openmp: 128
using sched_getaffinity: 128

Using taskset:

$ taskset -c 1 ./ccores 
using get_nprocs: 128
using sysconf(_SC_NPROCESSORS_ONLN): 128
using openmp: 1
using sched_getaffinity: 1

Same program submitted through slurm on a 96 cores server (HT off) with a single core requested (-c 1):

$ srun -c 1 ./ccores
srun: job 55388620 queued and waiting for resources
srun: job 55388620 has been allocated resources
using get_nprocs: 96
using sysconf(_SC_NPROCESSORS_ONLN): 96
using openmp: 1
using sched_getaffinity: 1

The problem with this code is that sysconf(_SC_NPROCESSORS_ONLN) corresponds to the hardware cores count and not the available core to the running process. The omp pragma is working correctly.

I think a more sane behavior for mmseqs would be to use sched_getaffinity as in the folly library from facebook for example (folly/folly/system/HardwareConcurrency.cpp)

#include <folly/system/HardwareConcurrency.h>

#include <thread>

#include <folly/portability/Sched.h>

namespace folly {

unsigned int hardware_concurrency() noexcept {
#if defined(__linux__) && !defined(__ANDROID__)
  cpu_set_t cpuset;
  if (!sched_getaffinity(0, sizeof(cpuset), &cpuset)) {
    auto count = CPU_COUNT(&cpuset);
    if (count != 0) {
      return count;
    }
  }
#endif

  return std::thread::hardware_concurrency();
}

} // namespace folly

This is similar to bshoshany/thread-pool#161 for example.

Thank you for considering this change !

Jean-Baptiste

@milot-mirdita
Copy link
Member

You can't currently use affinity with MMseqs2. The issue AFAIK is that we use multi stage execution that doesn't play well with affinity.

When you call a MMseqs2 workflow (e.g. mmseqs search) it calls internally a shell script that does multiple calls back to itself to start individual modules (e.g. mmseqs prefilter). I think the issue is that the parent process has an affinity mask applied that results in child processes not working correctly.

@jbd
Copy link
Author

jbd commented Jan 7, 2025

Hello, thank you for your answer.

I'm sorry if I got it wrong, but the suggestion was not to use affinity.

It is about using sched_getaffinity primitive to get the number of cores accessible to the process and child processes and not which one to use. This number will only be used as the default thread number, and nothing else.

To illustrate further, here is the search help output through slurm requesting only one core on a 96 cores machines:

$ srun -c 1 mmseqs search -h | grep threads
srun: job 55434337 queued and waiting for resources
srun: job 55434337 has been allocated resources
 --threads INT                    Number of CPU-cores used (all by default) [96]

It really should be [1] instead of [96] here (since I requested one core in the slurm srun command), because it would use 96 cores on a single core.

sysconf(_SC_NPROCESSORS_ONLN) will get the hardware core count
Util::omp_thread_count() will get the number of accessible cores to the process

Inverting the test in the Parameters::setDefaults() function will do the trick as long as MMseqs2 has been compiled with openmp support. From my point of view, the best solution would be to explicitly use the CPU_ISSET loop or CPU_COUNT(&cs) from my initial examples.

I hope it makes more sense now.

Thank you !

@milot-mirdita
Copy link
Member

Ah, sorry for the misunderstanding. That is a good idea

jbd added a commit to jbd/MMseqs2 that referenced this issue Jan 7, 2025
Use only Util::omp_thread_count() to get the number of accessible cores to the process and its children.

sysconf(_SC_NPROCESSORS_ONLN) gets the hardware core count that could be different if mmseqs2 is running within a constrained environmenet (taskset, cgroups, slurm...).

Resolve soedinglab#925
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants