Ticket 14313

Summary: '--gres gpu:<N>' is not handled upon separate SSH Session
Product: Slurm Reporter: Prabhjyot Saluja <prabhjyot_saluja>
Component: GPUAssignee: Oriol Vilarrubi <jvilarru>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: felip.moll
Version: 20.02.6   
Hardware: Linux   
OS: Linux   
Site: Brown Univ Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: /etc/pam.d/password-auth
cgroup.conf
/etc/pam.d/sshd
slurm.conf

Description Prabhjyot Saluja 2022-06-14 07:13:54 MDT
Created attachment 25496 [details]
/etc/pam.d/password-auth

Hi,

We are having an issue where a user can SSH into a node with active job and can access all GPUs on the node, i.e. 'nvidia-smi' returns all GPUs on the node. While the CPU fencing works correctly, i.e. I see same number of allocated cores upon ssh via (nproc) and (numactl -show) returns correct (physcpubind). I looked at the bug6411 but couldn't figure it out, so reaching out. 


Step 1: Start an interactive session requesting 1 GPU
salloc -J interact -N 1-1 -n 4 --time=30:00 --gres=gpu:1 --mem=20g -p gpu -C ampere srun --pty bash

Step 2: 
[ccvdemo@gpu2108 ~]$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-582f326c-9927-e90a-7e87-e33dcbec2fc9)

If I ssh into the node, then 
[ccvdemo@login006 ~]$ ssh gpu2108
[ccvdemo@gpu2108 ~]$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-6da44192-c454-700f-279f-2b1a7a94f302)
GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-a38a23eb-5d64-1960-771b-6a07d2b0e706)
GPU 2: NVIDIA GeForce RTX 3090 (UUID: GPU-f39da23f-48a4-c9bf-31b0-d101a7f45adb)
GPU 3: NVIDIA GeForce RTX 3090 (UUID: GPU-8cfe039b-9b9d-8304-1d9a-3e15f3545f8c)
GPU 4: NVIDIA GeForce RTX 3090 (UUID: GPU-582f326c-9927-e90a-7e87-e33dcbec2fc9)
GPU 5: NVIDIA GeForce RTX 3090 (UUID: GPU-f2c42d9c-9773-c022-582d-94765eaebcf7)
GPU 6: NVIDIA GeForce RTX 3090 (UUID: GPU-914d1d97-3563-4af0-d234-508e8c584781)
GPU 7: NVIDIA GeForce RTX 3090 (UUID: GPU-0f190ee8-0829-9871-5568-26c494fdc378)

Attachments: 
/etc/pam.d/sshd
/etc/pam.d/password-auth


Please let me know if you need any details. Thank you very much!
Comment 2 Felip Moll 2022-06-14 11:54:17 MDT
Hi!, I think you forgot to add the sshd file.

I will going to ask also slurm.conf, cgroup.conf and the result of:

cat /proc/self/cgroup

after you have logged in with ssh into the node with a job.
Comment 3 Prabhjyot Saluja 2022-06-14 12:25:03 MDT
Created attachment 25508 [details]
cgroup.conf
Comment 4 Prabhjyot Saluja 2022-06-14 12:26:02 MDT
Created attachment 25509 [details]
/etc/pam.d/sshd
Comment 5 Prabhjyot Saluja 2022-06-14 12:26:25 MDT
Created attachment 25510 [details]
slurm.conf
Comment 6 Prabhjyot Saluja 2022-06-14 12:32:53 MDT
Hi, Here is the output (/proc/self/cgroup):

From interact allocation:
[ccvdemo@gpu717 ~]$ cat /proc/self/cgroup
11:cpuset:/slurm/uid_140447539/job_5367608/step_0
10:perf_event:/
9:memory:/slurm/uid_140447539/job_5367608/step_0
8:blkio:/system.slice/slurmd.service
7:net_prio,net_cls:/
6:pids:/system.slice/slurmd.service
5:hugetlb:/
4:cpuacct,cpu:/system.slice/slurmd.service
3:devices:/slurm/uid_140447539/job_5367608/step_0
2:freezer:/slurm/uid_140447539/job_5367608/step_0
1:name=systemd:/system.slice/slurmd.service

From SSH session:
cat /proc/self/cgroup
11:cpuset:/slurm/uid_140447539/job_5367608/step_extern
10:perf_event:/
9:memory:/user.slice
8:blkio:/user.slice
7:net_prio,net_cls:/
6:pids:/user.slice
5:hugetlb:/
4:cpuacct,cpu:/user.slice
3:devices:/user.slice
2:freezer:/slurm/uid_140447539/job_5367608/step_extern
1:name=systemd:/user.slice/user-140447539.slice/session-15153.scope
Comment 8 Oriol Vilarrubi 2022-06-16 10:40:11 MDT
Hello Prabhjyot,

I see that in your password-auth file you have the following line:
-session     optional      pam_systemd.so

This will load the pam_systemd.so file, which will "steal" the processes from slurm cgroup into systemd ones, those cgroups do not have devices limitation, thus will not enforce the GPU restriction.

What is needed is to completely remove (or comment) all the systemd lines in the pam files. I guess that this was your intention at inclusing the - character in front of it, but what this does it to not make pam fail if that module is not found.

Greetings.
Comment 9 Prabhjyot Saluja 2022-06-16 11:30:58 MDT
Thank you so much! Exactly with '-' that was the intention but didn't realize it needs to be #commented out. That did the trick. Appreciate your help.

Regards,
Singh
Comment 10 Oriol Vilarrubi 2022-06-16 11:46:20 MDT
Hello,

I'm happy to help, closing this bug as infogiven. Do not hesitate to contact us if you encounter more issues.

Regards.