Skip to content

(3.0.0 3.6.0) Ptrace_scope not disabled for Ubuntu compute nodes

Ryan Anderson edited this page Jun 16, 2023 · 3 revisions

(3.0.0-3.6.0) Ptrace_scope not disabled for Ubuntu compute nodes

Libfabric and EFA require that classic ptrace permissions are set on any AMI that uses them for maximum performance. The default setting in Ubuntu for the ptrace_scope kernel parameter is to restrict ptrace by default. This means EFA will operate with reduced bandwidth since the cross memory attach feature is not available. Only versions of PC 3.0 or greater running on Ubuntu are affected. We added a clause in the cookbook to restore classic ptrace permissions by setting the ptrace_scope parameter to 0 in v3.0 but the CLI and Cookbook were misaligned and the clause was never activated. Because of this discrepancy, ptrace_scope is not set to 0 and the classic ptrace permissions are not set, resulting in much lower EFA bandwidth.

Steps to reproduce the bug, including an example configuration file.

  1. Launch a cluster with EFA enabled
Region: us-west-2
Image:
  Os: ubuntu2004
HeadNode:
  InstanceType: c5.18xlarge
  Networking:
    SubnetId: subnet-0ede71e3f22931611
  Ssh:
    KeyName: ndry-hpc
Scheduling:
  Scheduler: slurm
  SlurmQueues:
  - Name: queue-c5n18xlarge
    ComputeResources:
    - Name: c5n18xlarge
      InstanceType: c5n.18xlarge
      MinCount: 2
      MaxCount: 2
      Efa:
        Enabled: true
      DisableSimultaneousMultithreading: true
    Networking:
      SubnetIds:
      - subnet-0ede71e3f22931611
      PlacementGroup:
        Enabled: true
  1. SSH into the head node pcluster-ssh -i <key-pair> -n <name>
  2. Get the node hostname from sinfo. The hostname is in the NODELIST
ubuntu@ip-10-0-2-218:~$ sinfo
            PARTITION          AVAIL  TIMELIMIT  NODES  STATE NODELIST
            queue-c5n18xlarge*    up   infinite      2   idle queue-c5n18xlarge-c5n18xlarge-[1-2]
  1. SSH into the compute node ssh \<node-hostname\>
  2. Verify the value of the kernel parameter.
    a. cat /proc/sys/kernel/yama/ptrace_scope
    b. It should be 1, which means ptrace has restricted permissions.

Affected versions (OSes, schedulers)

3.0.0-3.6.0, Ubuntu only

Mitigation

To restore full EFA bandwidth on an existing cluster perform the following steps

  1. Stop the compute fleet

pcluster update-compute-fleet —status STOP_REQUESTED —cluster-name <name>

  1. Add a dev setting at the end of the cluster config file to override the property to match the cookbook resource.
DevSettings:
  Cookbook:
    ExtraChefAttributes: '{"cluster": {"enable_efa": "compute"}}'
  1. Update the cluster. *WARNING: DO NOT UPDATE THE CLUSTER WITHOUT STOPPING THE COMPUTE FLEET FIRST!! Remember to back up any data in case of data loss on the head node or compute nodes if not using external shared storage.

pcluster update-cluster --cluster-name <name> --cluster-configuration <config>

  1. Restart the compute fleet

pcluster update-compute-fleet --status START_REQUESTED --cluster-name <name>

  1. SSH into the head node pcluster-ssh -i <key-pair> -n <name>
  2. Get the node hostname from sinfo. The hostname is in the NODELIST
ubuntu@ip-10-0-2-218:~$ sinfo
PARTITION          AVAIL  TIMELIMIT  NODES  STATE NODELIST
queue-c5n18xlarge*    up   infinite      2 idle queue-c5n18xlarge-c5n18xlarge-[1-2]
  1. SSH into the compute node ssh <node-hostname>
  2. Verify the value of the kernel parameter.
    a. cat /proc/sys/kernel/yama/ptrace_scope
    b. It should be 0, which means ptrace has classic permissions.

To get full EFA bandwidth with a new cluster

  1. Add a dev setting at the end of the cluster config file to override the property to match the cookbook resource.
DevSettings:
  Cookbook:
    ExtraChefAttributes: '{"cluster": {"enable_efa": "compute"}}'
  1. Create the cluster

pcluster create-cluster --cluster-name <name> --cluster-configuration <config>

  1. SSH into the head node pcluster-ssh -i <key-pair> -n <name>
  2. Get the node hostname from sinfo. The hostname is in the NODELIST
 ubuntu@ip-10-0-2-218:~$ sinfo
        PARTITION          AVAIL  TIMELIMIT  NODES  STATE NODELIST
        queue-c5n18xlarge*    up   infinite      2   idle queue-c5n18xlarge-c5n18xlarge-[1-2]
  1. SSH into the compute node ssh <node-hostname>
  2. Verify the value of the kernel parameter.
    a. cat /proc/sys/kernel/yama/ptrace_scope
    b. It should be 0, which means ptrace has classic permissions.
Clone this wiki locally