Ticket 15440 - modifying cgroup.conf with running jobs
Summary: modifying cgroup.conf with running jobs
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 21.08.6
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Jason Booth
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-11-16 17:23 MST by Jeff Haferman
Modified: 2022-11-17 12:48 MST (History)
0 users

See Also:
Site: NPS HPC
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Jeff Haferman 2022-11-16 17:23:13 MST
I want to modify cgroup.conf, but I'm not sure how this will affect running jobs. I have always made changes to cgroup.conf during maintenance periods. 

For example, now I have in cgroup.conf:

CgroupAutomount=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainDevices=yes

and am considering going to:

ConstrainKmemSpace=yes
ConstrainCores=yes
CgroupAutomount=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
TaskAffinity=no
MaxSwapPercent=10


Can I do this with running jobs? If so, can I then just execute an "scontrol reconfig" or would I need to restart slurmd / slurmctld?
Comment 1 Jason Booth 2022-11-17 11:09:51 MST


TaskAffinity was removed starting in 21.08.2 and has been obsolete for the last several releases.

It is recommended to use TaskPlugin to enable these and stack both cgroup and affinity.

>TaskPlugin=task/cgroup,task/affinity

If you want to disable affinity you would remove that plugin from the TaskPlugin line, although I am curious why you want to do this.

>TaskPlugin=task/cgroup


> Can I do this with running jobs? If so, can I then just execute an "scontrol 
> reconfig" or would I need to restart slurmd / slurmctld?

Modifying these setting should work with a reconfigure since you already use the cgroups plugin. You would run into problems if you tried to enable/disable the entirety of the plugin without a restart. I am assuming here but it sounds like you may also have task/affinity enabled.

In this case, you are able to remove task/affinity but you will need to restart the slurmd's.


[1] https://github.com/SchedMD/slurm/blob/master/NEWS#L1008
[2] https://slurm.schedmd.com/cgroups.html#task
Comment 2 Jeff Haferman 2022-11-17 11:27:55 MST
Sorry, that was a cut-and-paste error, I was going the other direction. From:

ConstrainKmemSpace=yes
ConstrainCores=yes
CgroupAutomount=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
TaskAffinity=no
MaxSwapPercent=10


to:

CgroupAutomount=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainDevices=yes


I really wanted to get rid of the "Swap" stuff and was worried running jobs might die during a reconfig. But I just did it and everything appears to have gone fine.
Comment 3 Jason Booth 2022-11-17 12:48:44 MST
> I really wanted to get rid of the "Swap" stuff and was worried running jobs might die during a reconfig. But I just did it and everything appears to have gone fine.

Yes, that is fine as well. The limits/cgroups are instantiated during step creation. Running jobs will clean these setting up once they finish.