Hi. We are being hit hard by what we believe is the memory cgroup issue. [2018-07-27T15:50:07.660] task_p_slurmd_batch_request: 67020 [2018-07-27T15:50:07.660] task/affinity: job 67020 CPU input mask for node: 0x000000000002 [2018-07-27T15:50:07.660] task/affinity: job 67020 CPU final HW mask for node: 0x000001000000 [2018-07-27T15:50:07.661] _run_prolog: run job script took usec=9 [2018-07-27T15:50:07.663] _run_prolog: prolog with lock for job 67020 ran for 0 seconds [2018-07-27T15:50:07.663] Launching batch job 67020 for UID 62356 [2018-07-27T15:50:07.845] [67020.batch] task/cgroup: /slurm/uid_62356/job_67020: alloc=5000MB mem.limit=5000MB memsw.limit=5000MB [2018-07-27T15:50:07.845] [67020.batch] error: xcgroup_instantiate: unable to create cgroup '/sys/fs/cgroup/memory/slurm/uid_62356/job_67020/step_batch' : No space left on device [2018-07-27T15:50:07.855] [67020.batch] error: task/cgroup: unable to add task[pid=13971] to memory cg '(null)' [2018-07-27T15:50:07.858] [67020.batch] task_p_pre_launch: Using sched_affinity for tasks When this happens we have application slurmstepd jobs taking 50 mins to complete instead of 50 seconds. Previous attempts to resolve this have always ended up needing a reboot. We will then have a period of running ok before it blows up again. In an attempt to prove that this memory cgroup issue really is the cause of the application job process slowdown I thought that we might recover (without a reboot) if we stopped slurmd using memory cgroups but continued to use cgroups for cores. So I did the following: 1. Changed slurm.conf SelectTypeParameters from CR_CPU_memory to CR_CPU 2. Propagated slurm.conf 3. restarted backup slurmctld 4. restarted primary slurmctld 5. scontrol reconfigure That had no effect. Jobs still took many minutes to complete and the slurmd logfile still contained error: xcgroup_instantiate: unable to create cgroup '/sys/fs/cgroup/memory/slurm/uid_62356/job_67020/step_batch' : No space left on device error: task/cgroup: unable to add task[pid=13971] to memory cg '(null)' messages. I tried resatarting the slurmd daemon but that had no effect either. Next I edited cgroup.conf on the compute node and changed ConstrainRAMSpace=yes to be ConstrainRAMSpace=no and restarted the slurmd daemon. This seems to have had some success. Our application jobs are running once again at <60s elapsed times. However, the slurmd log file still shows these messages... [2018-07-30T12:45:04.797] [76245.batch] error: xcgroup_instantiate: unable to create cgroup '/sys/fs/cgroup/memory/slurm/uid_62356/job_76245' : No space left on device [2018-07-30T12:45:04.812] [76246.batch] error: xcgroup_instantiate: unable to create cgroup '/sys/fs/cgroup/memory/slurm/uid_62356/job_76246' : No space left on device [2018-07-30T12:45:04.818] [76245.batch] error: task/cgroup: unable to add task[pid=30506] to memory cg '(null)' [2018-07-30T12:45:04.820] [76245.batch] task_p_pre_launch: Using sched_affinity for tasks [2018-07-30T12:45:04.823] [76246.batch] error: task/cgroup: unable to add task[pid=30507] to memory cg '(null)' [2018-07-30T12:45:04.826] [76246.batch] task_p_pre_launch: Using sched_affinity for tasks So my question is how do we properly configure our compute node(s) to use cgroups for CPUs (cores) but *NOT* for memory. Thanks. Mark.
See bug 5082. All you have to do is set ConstrainKmemSpace=No in cgroup.conf, and you won't hit the issue anymore. But to recover the memory you have to restart the node.
So to confirm, you are saying that I do not need to change SelectTypeParameters from CR_CPU_memory to CR_CPU or set ConstrainRAMSpace=no I just need to set ConstrainKmemSpace=No Is that correct? Thanks. Mark.
Yes, that's correct. ConstrainKmemSpace=No is the only change - don't make any of the other changes that you made. But you have to restart the node in order to reclaim the memory that was leaked. Can you confirm that solves the problems you're experiencing?
Ok, I will unwind the other changes, put the Kmem fix in place and get the server rebooted. I'll come back to you in the next day or so once we've had chance to test this. Thanks. Mark.
Sounds good. Just for your information, ConstrainKmemSpace=No by default in 18.08. We made that change because of this very bug that you are experiencing (and several others have already hit).
Have you had any more problems?
Closing as resolved/duplicate of 5082. *** This ticket has been marked as a duplicate of ticket 5082 ***