Ticket 6253 - cgroup constrain devices not enforced
Summary: cgroup constrain devices not enforced
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmd (show other tickets)
Version: 17.11.8
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Marshall Garey
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-12-17 12:52 MST by Dylan Simon
Modified: 2018-12-18 15:19 MST (History)
0 users

See Also:
Site: Simons Foundation & Flatiron Institute
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurm.conf (8.61 KB, text/plain)
2018-12-17 12:52 MST, Dylan Simon
Details
cgroup.conf (582 bytes, text/plain)
2018-12-17 12:53 MST, Dylan Simon
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Dylan Simon 2018-12-17 12:52:56 MST
Created attachment 8672 [details]
slurm.conf

Pretty much the same as bug #3015.  We just upgraded to 17.11.8 (bright 8.1) and CentOS 7.6 (3.10.0-957.1.3.el7) and turned on ConstrainDevices.  (We were already using CPU and memory cgroups fine.)  I see that it is correctly parsing the default device list and gres devices and adding/removing them:

...
[2018-12-17T14:31:31.521] [167647.extern] debug2: Default access allowed to device c 202:0 rwm for job
[2018-12-17T14:31:31.521] [167647.extern] debug3: xcgroup_set_param: parameter 'devices.allow' set to 'c 202:0 rwm' for '/sys/fs/cgroup/devices/slurm/uid_1135/job_167647'
[2018-12-17T14:31:31.521] [167647.extern] debug2: Default access allowed to device c 5:2 rwm for job
[2018-12-17T14:31:31.521] [167647.extern] debug3: xcgroup_set_param: parameter 'devices.allow' set to 'c 5:2 rwm' for '/sys/fs/cgroup/devices/slurm/uid_1135/job_167647'
[2018-12-17T14:31:31.521] [167647.extern] debug:  Allowing access to device c 195:0 rwm(/dev/nvidia0) for job
[2018-12-17T14:31:31.521] [167647.extern] debug3: xcgroup_set_param: parameter 'devices.allow' set to 'c 195:0 rwm' for '/sys/fs/cgroup/devices/slurm/uid_1135/job_167647'
[2018-12-17T14:31:31.521] [167647.extern] debug:  Not allowing access to device c 195:1 rwm(/dev/nvidia1) for job
[2018-12-17T14:31:31.521] [167647.extern] debug3: xcgroup_set_param: parameter 'devices.deny' set to 'c 195:1 rwm' for '/sys/fs/cgroup/devices/slurm/uid_1135/job_167647'

However, devices.list still shows "a *:* rwm", which is the setting it inherts from the root devices cgroup, so all devices are still whitelisted.  Manually adding "a" to devices.deny to clear the whitelist seems to work.

Is there somewhere this default deny-all setting needs to be put either in slurm or the system (systemd?) to make this work?  Does this need bug #5361 to be fixed first?  I feel like I'm just missing some obvious setting.
Comment 1 Dylan Simon 2018-12-17 12:53:23 MST
Created attachment 8673 [details]
cgroup.conf
Comment 2 Dylan Simon 2018-12-18 12:19:43 MST
On further testing, the gres deny rules for unallocated /dev/nvidia devices may be working, but the default whitelist is definitely not -- unlisted devices can still be used.
Comment 3 Marshall Garey 2018-12-18 13:41:59 MST
At one point in time, the way constraining devices in cgroups works changed. You'll find some helpful background here:

https://bugs.schedmd.com/show_bug.cgi?id=5361https://www.kernel.org/doc/Documentation/cgroup-v1/devices.txt

The second paragraph:

"The root device cgroup starts with rwm to 'all'.  A child device
cgroup gets a copy of the parent.  Administrators can then remove
devices from the whitelist or add new entries.  A child cgroup can
never receive a device access which is denied by its parent."

The last paragraph:

"device cgroups is implemented internally using a behavior (ALLOW, DENY) and a
list of exceptions.  The internal state is controlled using the same user
interface to preserve compatibility with the previous whitelist-only
implementation.  Removal or addition of exceptions that will reduce the access
to devices will be propagated down the hierarchy.
For every propagated exception, the effective rules will be re-evaluated based
on current parent's access rules."

It sounds like there was a previous behavior ("previous whitelist-only implementation"). But with the new behavior, ***the cgroup_allowed_devices file doesn't do anything anymore.*** We haven't yet included the contribution in bug 5361. There is additional work that needs to be done to cleanup the task/cgroup plugin. Currently the plugin whitelists everything in the cgroup_allowed_devices file (which isn't needed, since everything is already whitelisted). Then it whitelists any GRES the job has in its allocation (also not needed), and blacklists every GRES the job does not have in its allocation (this is how devices are constrained).

So, unless you specify a device in your gres.conf, it will be available to be used.

Does that answer your question?
Comment 4 Dylan Simon 2018-12-18 15:13:45 MST
That makes sense and sounds like what we're seeing, yes -- only explicit gres devices are removed from the whitelist.  We can explicitly set the whitelist in some parent cgroup, or this may be improved in some future release.  Thanks.
Comment 5 Marshall Garey 2018-12-18 15:19:05 MST
You're welcome. Closing