Ticket 4518

Summary: devices cgroups regression
Product: Slurm Reporter: Kilian Cavalotti <kilian>
Component: slurmdAssignee: Brian Christiansen <brian>
Status: RESOLVED DUPLICATE QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 17.11.0   
Hardware: Linux   
OS: Linux   
Site: Stanford Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Kilian Cavalotti 2017-12-13 16:41:49 MST
Sorry for the barrage of submissions today, but as you probably guessed, we've upgraded to 17.11. And transition is not as smooth as we expected, unfortunately. :\

It looks like ContraintDevices doesn't work anymore in 17.11. Using the exact same (functional) config as we used in 17.02, we can't seem to make device access restrictions work.

For instance, when requesting a single GPU, I can see all the GPUs on the node:

$ srun --pty --gres gpu:1 -w sh-112-07 bash

[kilian@sh-112-07 ~]$ nvidia-smi -L
GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-27db8534-9b2b-8b1a-5889-9c77c0c7be4e)
GPU 1: Tesla P100-PCIE-16GB (UUID: GPU-b84691c7-e6e4-33c1-f367-1831e42cf4c6)
GPU 2: Tesla P100-PCIE-16GB (UUID: GPU-0d9a859c-ce19-78f3-2f87-aade11d14bae)
GPU 3: Tesla P100-PCIE-16GB (UUID: GPU-59405c9d-6554-d857-1cbd-8b9487464684)


The devices cgroups seem to have been created properly:

[kilian@sh-112-07 ~]$ echo $SLURM_JOBID
4542325
[kilian@sh-112-07 ~]$ lscgroup | grep devices
devices:/
devices:/slurm
devices:/slurm/uid_215845/job_4542325
devices:/slurm/uid_215845/job_4542325/step_0
devices:/slurm/uid_215845/job_4542325/step_extern
[kilian@sh-112-07 ~]$ tree /sys/fs/cgroup/devices/slurm/uid_215845/job_4542065/
/sys/fs/cgroup/devices/slurm/uid_215845/job_4542065/
├── cgroup.clone_children
├── cgroup.event_control
├── cgroup.procs
├── devices.allow
├── devices.deny
├── devices.list
├── notify_on_release
├── step_extern
│   ├── cgroup.clone_children
│   ├── cgroup.event_control
│   ├── cgroup.procs
│   ├── devices.allow
│   ├── devices.deny
│   ├── devices.list
│   ├── notify_on_release
│   └── tasks
└── tasks


The GPUs are correctly detected on the node:

slurmd[109644]: Gres Name=gpu Type=(null) Count=2
slurmd[109644]: Gres Name=gpu Type=(null) Count=2
slurmd[109644]: gpu device number 0(/dev/nvidia0):c 195:0 rwm
slurmd[109644]: gpu device number 1(/dev/nvidia1):c 195:1 rwm
slurmd[109644]: gpu device number 2(/dev/nvidia2):c 195:2 rwm
slurmd[109644]: gpu device number 3(/dev/nvidia3):c 195:3 rwm


The cgroup config is loaded, including the devices part:

slurmd[109644]: debug:  Reading cgroup.conf file /etc/slurm/cgroup.conf
slurmd[109644]: debug:  CPUs:20 Boards:1 Sockets:2 CoresPerSocket:10 ThreadsPerCore:1
slurmd[109644]: debug:  Reading cgroup.conf file /etc/slurm/cgroup.conf
slurmd[109644]: debug:  task/cgroup: now constraining jobs allocated cores
slurmd[109644]: debug:  task/cgroup/memory: total:515705M allowed:100%(enforced), swap:0%(enforced), max:100%(515705M) max+swap:100%(1031410M) min:30M kmem:100%(515705M permissive) min:30M swappiness:18446744073709551614(set)
slurmd[109644]: debug:  task/cgroup: now constraining jobs allocated memory
slurmd[109644]: debug:  task/cgroup: now constraining jobs allocated devices
slurmd[109644]: debug:  task/cgroup: loaded



Yet, there's no mention of GRES devices when the job start:


slurmd[109653]: debug:  Checking credential with 448 bytes of sig data
slurmd[109653]: debug:  Reading cgroup.conf file /etc/slurm/cgroup.conf
slurmd[109653]: debug:  Calling /usr/sbin/slurmstepd spank prolog
slurmd[109653]: debug:  [job 4542325] attempting to run prolog [/etc/slurm/scripts/prolog.sh]
slurmd[109653]: _run_prolog: run job script took usec=38237
slurmd[109653]: _run_prolog: prolog with lock for job 4542325 ran for 0 seconds
slurmstepd[109874]: task/cgroup: /slurm/uid_215845/job_4542325: alloc=12800MB mem.limit=12800MB memsw.limit=12800MB
slurmstepd[109874]: task/cgroup: /slurm/uid_215845/job_4542325/step_extern: alloc=12800MB mem.limit=12800MB memsw.limit=12800MB
slurmd[109653]: launch task 4542325.0 request from 215845.32264@10.10.0.61 (port 54987)
slurmd[109653]: debug:  Checking credential with 492 bytes of sig data
slurmd[109653]: debug:  Waiting for job 4542325's prolog to complete
slurmd[109653]: debug:  Finished wait for job 4542325's prolog to complete
slurmstepd[109881]: task/cgroup: /slurm/uid_215845/job_4542325: alloc=12800MB mem.limit=12800MB memsw.limit=12800MB
slurmstepd[109881]: task/cgroup: /slurm/uid_215845/job_4542325/step_0: alloc=12800MB mem.limit=12800MB memsw.limit=12800MB
slurmstepd[109881]: in _window_manager


In previous versions, there was a part that said:

slurmstepd[6248]: Allowing access to device c 195:0 rwm
slurmstepd[6248]: Allowing access to device c 195:1 rwm
slurmstepd[6248]: Not allowing access to device c 195:2 rwm
slurmstepd[6248]: Not allowing access to device c 195:3 rwm

which doesn't appear in the logs anymore.


Is there any additional configuration required in 17.11 to use the devices cgroup constraints?

Thanks!
-- 
Kilian
Comment 1 Brian Christiansen 2017-12-13 16:54:35 MST
Hey Killian,

This was found and fixed in Bug 4455.

Specifically in commits:
ee68721350dc46d62bebc64e86378b06fd95f4a5
0ed03cda5bcf4e0bd5ef8117d4d5ce7fa84a71e3
434acb17c8526bc209626084587303cd5c5b79fa

I'm going to mark this closed as a duplicate of Bug 4455. If these patches don't fix it for you, please reopen the bug.

Thanks,
Brian

*** This ticket has been marked as a duplicate of ticket 4455 ***
Comment 2 Kilian Cavalotti 2017-12-13 18:00:31 MST
Hi Brian, 

Thanks for pointing this out, I'll give a try at the patches.

Cheers,
-- 
Kilian
Comment 3 Kilian Cavalotti 2017-12-13 19:16:41 MST
I confirm that the mentioned commits indeed fix the issue.

Thanks!
-- 
Kilian