Ticket 14313 - '--gres gpu:<N>' is not handled upon separate SSH Session
Summary: '--gres gpu:<N>' is not handled upon separate SSH Session
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: GPU (show other tickets)
Version: 20.02.6
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Oriol Vilarrubi
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-06-14 07:13 MDT by Prabhjyot Saluja
Modified: 2022-06-16 11:46 MDT (History)
1 user (show)

See Also:
Site: Brown Univ
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
/etc/pam.d/password-auth (760 bytes, text/plain)
2022-06-14 07:13 MDT, Prabhjyot Saluja
Details
cgroup.conf (347 bytes, text/x-matlab)
2022-06-14 12:25 MDT, Prabhjyot Saluja
Details
/etc/pam.d/sshd (662 bytes, text/plain)
2022-06-14 12:26 MDT, Prabhjyot Saluja
Details
slurm.conf (10.95 KB, text/plain)
2022-06-14 12:26 MDT, Prabhjyot Saluja
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Prabhjyot Saluja 2022-06-14 07:13:54 MDT
Created attachment 25496 [details]
/etc/pam.d/password-auth

Hi,

We are having an issue where a user can SSH into a node with active job and can access all GPUs on the node, i.e. 'nvidia-smi' returns all GPUs on the node. While the CPU fencing works correctly, i.e. I see same number of allocated cores upon ssh via (nproc) and (numactl -show) returns correct (physcpubind). I looked at the bug6411 but couldn't figure it out, so reaching out. 


Step 1: Start an interactive session requesting 1 GPU
salloc -J interact -N 1-1 -n 4 --time=30:00 --gres=gpu:1 --mem=20g -p gpu -C ampere srun --pty bash

Step 2: 
[ccvdemo@gpu2108 ~]$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-582f326c-9927-e90a-7e87-e33dcbec2fc9)

If I ssh into the node, then 
[ccvdemo@login006 ~]$ ssh gpu2108
[ccvdemo@gpu2108 ~]$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-6da44192-c454-700f-279f-2b1a7a94f302)
GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-a38a23eb-5d64-1960-771b-6a07d2b0e706)
GPU 2: NVIDIA GeForce RTX 3090 (UUID: GPU-f39da23f-48a4-c9bf-31b0-d101a7f45adb)
GPU 3: NVIDIA GeForce RTX 3090 (UUID: GPU-8cfe039b-9b9d-8304-1d9a-3e15f3545f8c)
GPU 4: NVIDIA GeForce RTX 3090 (UUID: GPU-582f326c-9927-e90a-7e87-e33dcbec2fc9)
GPU 5: NVIDIA GeForce RTX 3090 (UUID: GPU-f2c42d9c-9773-c022-582d-94765eaebcf7)
GPU 6: NVIDIA GeForce RTX 3090 (UUID: GPU-914d1d97-3563-4af0-d234-508e8c584781)
GPU 7: NVIDIA GeForce RTX 3090 (UUID: GPU-0f190ee8-0829-9871-5568-26c494fdc378)

Attachments: 
/etc/pam.d/sshd
/etc/pam.d/password-auth


Please let me know if you need any details. Thank you very much!
Comment 2 Felip Moll 2022-06-14 11:54:17 MDT
Hi!, I think you forgot to add the sshd file.

I will going to ask also slurm.conf, cgroup.conf and the result of:

cat /proc/self/cgroup

after you have logged in with ssh into the node with a job.
Comment 3 Prabhjyot Saluja 2022-06-14 12:25:03 MDT
Created attachment 25508 [details]
cgroup.conf
Comment 4 Prabhjyot Saluja 2022-06-14 12:26:02 MDT
Created attachment 25509 [details]
/etc/pam.d/sshd
Comment 5 Prabhjyot Saluja 2022-06-14 12:26:25 MDT
Created attachment 25510 [details]
slurm.conf
Comment 6 Prabhjyot Saluja 2022-06-14 12:32:53 MDT
Hi, Here is the output (/proc/self/cgroup):

From interact allocation:
[ccvdemo@gpu717 ~]$ cat /proc/self/cgroup
11:cpuset:/slurm/uid_140447539/job_5367608/step_0
10:perf_event:/
9:memory:/slurm/uid_140447539/job_5367608/step_0
8:blkio:/system.slice/slurmd.service
7:net_prio,net_cls:/
6:pids:/system.slice/slurmd.service
5:hugetlb:/
4:cpuacct,cpu:/system.slice/slurmd.service
3:devices:/slurm/uid_140447539/job_5367608/step_0
2:freezer:/slurm/uid_140447539/job_5367608/step_0
1:name=systemd:/system.slice/slurmd.service

From SSH session:
cat /proc/self/cgroup
11:cpuset:/slurm/uid_140447539/job_5367608/step_extern
10:perf_event:/
9:memory:/user.slice
8:blkio:/user.slice
7:net_prio,net_cls:/
6:pids:/user.slice
5:hugetlb:/
4:cpuacct,cpu:/user.slice
3:devices:/user.slice
2:freezer:/slurm/uid_140447539/job_5367608/step_extern
1:name=systemd:/user.slice/user-140447539.slice/session-15153.scope
Comment 8 Oriol Vilarrubi 2022-06-16 10:40:11 MDT
Hello Prabhjyot,

I see that in your password-auth file you have the following line:
-session     optional      pam_systemd.so

This will load the pam_systemd.so file, which will "steal" the processes from slurm cgroup into systemd ones, those cgroups do not have devices limitation, thus will not enforce the GPU restriction.

What is needed is to completely remove (or comment) all the systemd lines in the pam files. I guess that this was your intention at inclusing the - character in front of it, but what this does it to not make pam fail if that module is not found.

Greetings.
Comment 9 Prabhjyot Saluja 2022-06-16 11:30:58 MDT
Thank you so much! Exactly with '-' that was the intention but didn't realize it needs to be #commented out. That did the trick. Appreciate your help.

Regards,
Singh
Comment 10 Oriol Vilarrubi 2022-06-16 11:46:20 MDT
Hello,

I'm happy to help, closing this bug as infogiven. Do not hesitate to contact us if you encounter more issues.

Regards.