| Summary: | PrologFlags=contain not setting up cgroups properly | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Ryan Cox <ryan_cox> |
| Component: | Other | Assignee: | Moe Jette <jette> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | brian, da |
| Version: | 15.08.0 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | BYU - Brigham Young University | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 15.08.2 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
With respect to the messing cgroup containers, I do not see what you describe. Here are the relevant configuration options that I have set: TaskPlugin=task/cgroup ProctrackType=proctrack/cgroup JobAcctGatherType=jobacct_gather/cgroup Prolog=/home/jette/SLURM/install_smd/sbin/prolog PrologFlags=alloc,contain I do see the memory limit issue that you describe. The proctrack plugin is building the job container plugins when the prolog runs, but that plugin does not set up the cpusets or memory limits (the task plugin does that). So either the logic to set those up those limits needs to be added to the proctrack plugin or the task plugin needs to be called to set them up (not currently done when the prolog runs, only when tasks are actually spawned). Unfortunately it's not looking like a simple thing to fix. After I developed a better understanding of the various cgroup pluging interactions, it turned out to not be so bad. This commit should fix the problem, at least it does for me: https://github.com/SchedMD/slurm/commit/80dcbf7ec4ffb72530b76ee61277feecce778aca A few notes for you: * The change will be in version 15.08.2 when released, likely late October. * When the prolog runs, it creates a "step_extern" directory rather than "step_4294967295" * There will be a subdirectory of "step_extern" of the form "task_#" into which will be placed a "sleep" process just to occupy the cgroup and avoid getting it purged. * The "task_#" directories will not have memory limits set, but will the step's limit will be enforced because "memory.use_hierarchy" is set to 1 (each level in the hierarchy will enforce limits, so the job limit will be in force). * Various minimum and maximum limits can be configured in cgroup.conf (e.g. "MinRAMSpace", see "man cgroup.conf" for details. Let me know if this does the trick for you. Excellent. I'm at a conference this week so I won't be able to test it for a while, but that sounds like it should work perfectly. Thanks! Did you have a chance to check out the patch? I just barely tested it and it appears to all be working. The testing wasn't extensive but all the right cgroups get created on node 0 and the other nodes, and the limits in the memory and cpuset cgroups look correct. This should allow me to finish the rest of the code for pam_slurm_adopt. Thanks! Excellent! Let me know if you have other problems (email or a new ticket would probably be best). |
I have been testing PrologFlags=contain and it doesn't quite do everything I was expecting with regards to cgroups. I am testing on two nodes: m6-31-[3-4]. m6-31-3 is node 0 for the job. Note the difference in the cgroups which are created on each node. m6-31-3 has cpuacct, cpuset, freezer, and memory. m6-31-4 has just freezer and memory. The memory cgroup on m6-31-3 has limits in place but m6-31-4 does not have any limits. I would expect that both nodes would have /cgroup/memory/slurm/uid_$UID/job_$SLURM_JOB_ID/memory{,.memsw}.limit_in_bytes set to the appropriate values for the nodes. I would also expect the other cgroups (cpuset and cpuacct) to be set up on m6-31-4. [root@m6-31-3 ~]# ls -d /cgroup/*/slurm/uid_5627/job_8309455 /cgroup/cpuacct/slurm/uid_5627/job_8309455 /cgroup/cpuset/slurm/uid_5627/job_8309455 /cgroup/freezer/slurm/uid_5627/job_8309455 /cgroup/memory/slurm/uid_5627/job_8309455 [root@m6-31-3 ~]# find /cgroup/memory/slurm/uid_5627/job_8309453/ -name memory.limit_in_bytes -exec cat {} \; 9223372036854775807 9223372036854775807 1073741824 1073741824 [root@m6-31-3 ~]# find /cgroup/memory/slurm/uid_5627/job_8309453/ -name memory.memsw.limit_in_bytes -exec cat {} \; 9223372036854775807 9223372036854775807 1127432192 1127432192 [root@m6-31-3 ~]# find /cgroup/memory/slurm/uid_5627/job_8309453/ -name memory.limit_in_bytes /cgroup/memory/slurm/uid_5627/job_8309453/step_4294967295/task_0/memory.limit_in_bytes /cgroup/memory/slurm/uid_5627/job_8309453/step_4294967295/memory.limit_in_bytes /cgroup/memory/slurm/uid_5627/job_8309453/step_batch/memory.limit_in_bytes /cgroup/memory/slurm/uid_5627/job_8309453/memory.limit_in_bytes [root@m6-31-3 ~]# find /cgroup/memory/slurm/uid_5627/job_8309453 -type d /cgroup/memory/slurm/uid_5627/job_8309453 /cgroup/memory/slurm/uid_5627/job_8309453/step_4294967295 /cgroup/memory/slurm/uid_5627/job_8309453/step_4294967295/task_0 /cgroup/memory/slurm/uid_5627/job_8309453/step_batch [root@m6-31-3 ~]# cat /cgroup/memory/slurm/uid_5627/job_8309453/memory.limit_in_bytes 1073741824 [root@m6-31-4 ~]# ls -d /cgroup/*/slurm/uid_5627/job_8309455 /cgroup/freezer/slurm/uid_5627/job_8309455 /cgroup/memory/slurm/uid_5627/job_8309455 [root@m6-31-4 ~]# find /cgroup/memory/slurm/uid_5627/job_8309453/ -name memory.limit_in_bytes -exec cat {} \; 9223372036854775807 9223372036854775807 9223372036854775807 [root@m6-31-4 ~]# find /cgroup/memory/slurm/uid_5627/job_8309453/ -name memory.memsw.limit_in_bytes -exec cat {} \; 9223372036854775807 9223372036854775807 9223372036854775807 [root@m6-31-4 ~]# find /cgroup/memory/slurm/uid_5627/job_8309453/ -name memory.limit_in_bytes /cgroup/memory/slurm/uid_5627/job_8309453/step_4294967295/task_1/memory.limit_in_bytes /cgroup/memory/slurm/uid_5627/job_8309453/step_4294967295/memory.limit_in_bytes /cgroup/memory/slurm/uid_5627/job_8309453/memory.limit_in_bytes [root@m6-31-4 ~]# find /cgroup/memory/slurm/uid_5627/job_8309453 -type d /cgroup/memory/slurm/uid_5627/job_8309453 /cgroup/memory/slurm/uid_5627/job_8309453/step_4294967295 /cgroup/memory/slurm/uid_5627/job_8309453/step_4294967295/task_1 [root@m6-31-4 ~]# cat /cgroup/memory/slurm/uid_5627/job_8309453/memory.limit_in_bytes 9223372036854775807