Ticket 6062

Summary: Follow up to GPU cgroup discussion at SC
Product: Slurm Reporter: Steve Ford <fordste5>
Component: slurmdAssignee: Marshall Garey <marshall>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 18.08.1   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=6253
Site: MSU Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Steve Ford 2018-11-20 12:09:43 MST
Hello,

This is a follow up to a discussion I had at SC with Tim about constraining GPU devices using cgroups. I was mistaken about this behavior not being supported and was able to to constrain GPUs after setting "ConstrainDevices=yes" in my cgroup.conf file.

There is, however, some odd behavior surrounding the task cgroups that led me to think GPUs were not being constrained. When I request a GPU and examine the cgroup for the running task, the divices.list file shows the cgroup has access to all devices:

$ cat /sys/fs/cgroup/devices/slurm/uid_885046/job_2065332/step_5/devices.list 
a *:* rwm

The devices.list file is the same for all cgroups in the devices hierarchy:

$ find /sys/fs/cgroup -name devices.list -exec cat {} \; | uniq -c
     93 a *:* rwm

I know this isn't a SLURM issue, but I'm curious if this is expected given the way SLURM constrains GPU devices. Any insight you have is greatly appreciated.

Thanks,
Steve
Comment 2 Marshall Garey 2018-11-26 15:55:05 MST
Yes, this is expected. At one point in time, the way constraining devices in cgroups works changed. You'll find some helpful background here:

https://bugs.schedmd.com/show_bug.cgi?id=5361
https://www.kernel.org/doc/Documentation/cgroup-v1/devices.txt

The second paragraph:

"The root device cgroup starts with rwm to 'all'.  A child device
cgroup gets a copy of the parent.  Administrators can then remove
devices from the whitelist or add new entries.  A child cgroup can
never receive a device access which is denied by its parent."

The last paragraph:

"device cgroups is implemented internally using a behavior (ALLOW, DENY) and a
list of exceptions.  The internal state is controlled using the same user
interface to preserve compatibility with the previous whitelist-only
implementation.  Removal or addition of exceptions that will reduce the access
to devices will be propagated down the hierarchy.
For every propagated exception, the effective rules will be re-evaluated based
on current parent's access rules."

It sounds like there was a previous behavior ("previous whitelist-only implementation"). But with the new behavior, the cgroup_allowed_devices file doesn't do anything anymore. We haven't yet included the contribution in bug 5361. There is additional work that needs to be done to cleanup the task/cgroup plugin. Currently the plugin whitelists everything in the cgroup_allowed_devices file (which isn't needed). Then it whitelists any GRES the job has in its allocation (also not needed), and blacklists every GRES the job does not have in its allocation (this is how devices are constrained).

Does that answer your question?
Comment 4 Marshall Garey 2018-12-03 09:20:45 MST
Closing as info given.