Ticket 14313

Summary:	'--gres gpu:<N>' is not handled upon separate SSH Session
Product:	Slurm	Reporter:	Prabhjyot Saluja <prabhjyot_saluja>
Component:	GPU	Assignee:	Oriol Vilarrubi <jvilarru>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	4 - Minor Issue
Priority:	---	CC:	felip.moll
Version:	20.02.6
Hardware:	Linux
OS:	Linux
Site:	Brown Univ	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---
Attachments:	/etc/pam.d/password-auth cgroup.conf /etc/pam.d/sshd slurm.conf

Description Prabhjyot Saluja 2022-06-14 07:13:54 MDT

Created attachment 25496 [details]
/etc/pam.d/password-auth

Hi,

We are having an issue where a user can SSH into a node with active job and can access all GPUs on the node, i.e. 'nvidia-smi' returns all GPUs on the node. While the CPU fencing works correctly, i.e. I see same number of allocated cores upon ssh via (nproc) and (numactl -show) returns correct (physcpubind). I looked at the bug6411 but couldn't figure it out, so reaching out. 


Step 1: Start an interactive session requesting 1 GPU
salloc -J interact -N 1-1 -n 4 --time=30:00 --gres=gpu:1 --mem=20g -p gpu -C ampere srun --pty bash

Step 2: 
[ccvdemo@gpu2108 ~]$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-582f326c-9927-e90a-7e87-e33dcbec2fc9)

If I ssh into the node, then 
[ccvdemo@login006 ~]$ ssh gpu2108
[ccvdemo@gpu2108 ~]$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-6da44192-c454-700f-279f-2b1a7a94f302)
GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-a38a23eb-5d64-1960-771b-6a07d2b0e706)
GPU 2: NVIDIA GeForce RTX 3090 (UUID: GPU-f39da23f-48a4-c9bf-31b0-d101a7f45adb)
GPU 3: NVIDIA GeForce RTX 3090 (UUID: GPU-8cfe039b-9b9d-8304-1d9a-3e15f3545f8c)
GPU 4: NVIDIA GeForce RTX 3090 (UUID: GPU-582f326c-9927-e90a-7e87-e33dcbec2fc9)
GPU 5: NVIDIA GeForce RTX 3090 (UUID: GPU-f2c42d9c-9773-c022-582d-94765eaebcf7)
GPU 6: NVIDIA GeForce RTX 3090 (UUID: GPU-914d1d97-3563-4af0-d234-508e8c584781)
GPU 7: NVIDIA GeForce RTX 3090 (UUID: GPU-0f190ee8-0829-9871-5568-26c494fdc378)

Attachments: 
/etc/pam.d/sshd
/etc/pam.d/password-auth


Please let me know if you need any details. Thank you very much!

Comment 2 Felip Moll 2022-06-14 11:54:17 MDT

Hi!, I think you forgot to add the sshd file.

I will going to ask also slurm.conf, cgroup.conf and the result of:

cat /proc/self/cgroup

after you have logged in with ssh into the node with a job.

Comment 3 Prabhjyot Saluja 2022-06-14 12:25:03 MDT

Created attachment 25508 [details]
cgroup.conf

Comment 4 Prabhjyot Saluja 2022-06-14 12:26:02 MDT

Created attachment 25509 [details]
/etc/pam.d/sshd

Comment 5 Prabhjyot Saluja 2022-06-14 12:26:25 MDT

Created attachment 25510 [details]
slurm.conf

Comment 6 Prabhjyot Saluja 2022-06-14 12:32:53 MDT

Hi, Here is the output (/proc/self/cgroup):

From interact allocation:
[ccvdemo@gpu717 ~]$ cat /proc/self/cgroup
11:cpuset:/slurm/uid_140447539/job_5367608/step_0
10:perf_event:/
9:memory:/slurm/uid_140447539/job_5367608/step_0
8:blkio:/system.slice/slurmd.service
7:net_prio,net_cls:/
6:pids:/system.slice/slurmd.service
5:hugetlb:/
4:cpuacct,cpu:/system.slice/slurmd.service
3:devices:/slurm/uid_140447539/job_5367608/step_0
2:freezer:/slurm/uid_140447539/job_5367608/step_0
1:name=systemd:/system.slice/slurmd.service

From SSH session:
cat /proc/self/cgroup
11:cpuset:/slurm/uid_140447539/job_5367608/step_extern
10:perf_event:/
9:memory:/user.slice
8:blkio:/user.slice
7:net_prio,net_cls:/
6:pids:/user.slice
5:hugetlb:/
4:cpuacct,cpu:/user.slice
3:devices:/user.slice
2:freezer:/slurm/uid_140447539/job_5367608/step_extern
1:name=systemd:/user.slice/user-140447539.slice/session-15153.scope

Comment 8 Oriol Vilarrubi 2022-06-16 10:40:11 MDT

Hello Prabhjyot,

I see that in your password-auth file you have the following line:
-session     optional      pam_systemd.so

This will load the pam_systemd.so file, which will "steal" the processes from slurm cgroup into systemd ones, those cgroups do not have devices limitation, thus will not enforce the GPU restriction.

What is needed is to completely remove (or comment) all the systemd lines in the pam files. I guess that this was your intention at inclusing the - character in front of it, but what this does it to not make pam fail if that module is not found.

Greetings.

Comment 9 Prabhjyot Saluja 2022-06-16 11:30:58 MDT

Thank you so much! Exactly with '-' that was the intention but didn't realize it needs to be #commented out. That did the trick. Appreciate your help.

Regards,
Singh

Comment 10 Oriol Vilarrubi 2022-06-16 11:46:20 MDT

Hello,

I'm happy to help, closing this bug as infogiven. Do not hesitate to contact us if you encounter more issues.

Regards.