I'm testing 20.11.0.0pre1 from ~Friday and ran into a problem setting up cgroups. The uid_# subdirectories don't get set with 'cpuset.mems' which disallows tasks from being put into the cpuset. [root@lyra17 ~]# srun -n4 --ntasks-per-gpu=1 /bin/bash -c "env|grep ROC" slurmstepd: error: Failed to invoke task plugins: task_p_pre_launch error slurmstepd: error: Failed to invoke task plugins: task_p_pre_launch error slurmstepd: error: Failed to invoke task plugins: task_p_pre_launch error slurmstepd: error: Failed to invoke task plugins: task_p_pre_launch error srun: error: lyra17: tasks 0-3: Exited with exit code 1 srun: launch/slurm: _step_signal: Terminating StepId=3.0 [2020-10-28T20:27:18.974] [3.extern] Considering each NUMA node as a socket [2020-10-28T20:27:18.983] [3.extern] error: _file_write_uint32s: write pid 29589 to /sys/fs/cgroup/cpuset/slurm/uid_0/job_3/step_extern/cgroup.procs failed: No space left on device [2020-10-28T20:27:18.983] [3.extern] error: task_cgroup_cpuset_create: unable to add slurmstepd to cpuset cg '/sys/fs/cgroup/cpuset/slurm/uid_0/job_3/step_extern' ... [2020-10-28T20:27:41.744] [3.0] error: _file_write_uint32s: write pid 29661 to /sys/fs/cgroup/cpuset/slurm/uid_0/job_3/step_0/cgroup.procs failed: No space left on device [2020-10-28T20:27:41.744] [3.0] error: task_cgroup_cpuset_create: unable to add slurmstepd to cpuset cg '/sys/fs/cgroup/cpuset/slurm/uid_0/job_3/step_0' [2020-10-28T20:27:41.745] [3.0] task/cgroup: _memcg_initialize: /slurm/uid_0/job_3: alloc=0MB mem.limit=257740MB memsw.limit=unlimited [2020-10-28T20:27:41.745] [3.0] task/cgroup: _memcg_initialize: /slurm/uid_0/job_3/step_0: alloc=0MB mem.limit=257740MB memsw.limit=unlimited [2020-10-28T20:27:41.831] [3.0] error: Failed to invoke task plugins: task_p_pre_launch error [2020-10-28T20:27:41.831] [3.0] error: Failed to invoke task plugins: task_p_pre_launch error [2020-10-28T20:27:41.831] [3.0] error: Failed to invoke task plugins: task_p_pre_launch error [2020-10-28T20:27:41.831] [3.0] error: Failed to invoke task plugins: task_p_pre_launch error [2020-10-28T20:27:44.000] [3.0] done with job [root@lyra17 slurm]# cat /sys/fs/cgroup/cpuset/slurm/cpuset.mems 0-7 [root@lyra17 slurm]# cat /sys/fs/cgroup/cpuset/slurm/uid_0/cpuset.mems [root@lyra17 slurm]# echo $$ > /sys/fs/cgroup/cpuset/slurm/uid_0/tasks bash: echo: write error: No space left on device [root@lyra17 slurm]# echo $$ > /sys/fs/cgroup/cpuset/slurm/tasks [root@lyra17 slurm]# echo '0-7' > /sys/fs/cgroup/cpuset/slurm/uid_0/cpuset.mems [root@lyra17 slurm]# echo $$ > /sys/fs/cgroup/cpuset/slurm/uid_0/tasks After I set cpuset.mems for uid_0, subsequent jobs for UID 0 work. Is this expected behavior? # cat /etc/slurm/cgroup.conf CgroupAutomount=yes ConstrainCores=yes ConstrainRAMSpace=yes TaskAffinity=no AllowedRAMSpace=95 # grep -i cgroup /etc/slurm/slurm.conf ProctrackType=proctrack/cgroup TaskPlugin=task/affinity,task/cgroup
Matt, This looks like a dup of bug#9244. The current work around is to place value of cpuset.mems from the parent cgroup into the child recursively. It appears to be caused by a race condition during startup. --Nate
(In reply to Nate Rini from comment #1) > Matt, > > This looks like a dup of bug#9244. The current work around is to place value > of cpuset.mems from the parent cgroup into the child recursively. It appears > to be caused by a race condition during startup. > > --Nate I'm not authorized to see that bug. I'm not sure it's a race condition, just how cgroups works in this kernel. There's a parameter called cgroup.clone_children can impact how cgroups are created. That parameter seems somewhat controversial, as there have been patches to the kernel to remove it. Anyway: [root@lyra16 slurm]# pwd /sys/fs/cgroup/cpuset/slurm [root@lyra16 slurm]# cat cpuset.mems 0-1 [root@lyra16 slurm]# cat cgroup.clone_children 0 [root@lyra16 slurm]# mkdir matt [root@lyra16 slurm]# cat matt/cpuset.mems [root@lyra16 slurm]# echo 1 > cgroup.clone_children [root@lyra16 slurm]# mkdir matt2 [root@lyra16 slurm]# cat matt2/cpuset.mems 0-1 So I think the fix is either have Slurm write 1 into $CPUSETDIR/slurm/cgroup.clone_children (if it exists) at startup, or to make sure to set cpuset.mems for every subdirectory it creates.
(In reply to Matt Ezell from comment #2) > (In reply to Nate Rini from comment #1) > > Matt, > > > > This looks like a dup of bug#9244. The current work around is to place value > > of cpuset.mems from the parent cgroup into the child recursively. It appears > > to be caused by a race condition during startup. > > > > --Nate > > I'm not authorized to see that bug. I'm not sure it's a race condition, > just how cgroups works in this kernel. There's a parameter called > cgroup.clone_children can impact how cgroups are created. That parameter > seems somewhat controversial, as there have been patches to the kernel to > remove it. Anyway: > > [root@lyra16 slurm]# pwd > /sys/fs/cgroup/cpuset/slurm > [root@lyra16 slurm]# cat cpuset.mems > 0-1 > [root@lyra16 slurm]# cat cgroup.clone_children > 0 > [root@lyra16 slurm]# mkdir matt > [root@lyra16 slurm]# cat matt/cpuset.mems > > [root@lyra16 slurm]# echo 1 > cgroup.clone_children > [root@lyra16 slurm]# mkdir matt2 > [root@lyra16 slurm]# cat matt2/cpuset.mems > 0-1 > > So I think the fix is either have Slurm write 1 into > $CPUSETDIR/slurm/cgroup.clone_children (if it exists) at startup, or to make > sure to set cpuset.mems for every subdirectory it creates. That's exactly what the patch I am working on does. I will let you know when it is reviewed and done. Can you point me to some reference? I am interested in this information: > That parameter > seems somewhat controversial, as there have been patches to the kernel to > remove it.
(In reply to Felip Moll from comment #3) > Can you point me to some reference? I am interested in this information: > > > That parameter > > seems somewhat controversial, as there have been patches to the kernel to > > remove it. https://lists.linuxfoundation.org/pipermail/containers/2012-November/030813.html https://lwn.net/Articles/547332/ In cgroupsV2 the file does not exist: https://man7.org/linux/man-pages/man7/cgroups.7.html > In addition, the cgroup.clone_children file that is employed by the cpuset controller has been removed.
(In reply to Matt Ezell from comment #4) > (In reply to Felip Moll from comment #3) > > Can you point me to some reference? I am interested in this information: > > > > > That parameter > > > seems somewhat controversial, as there have been patches to the kernel to > > > remove it. > > https://lists.linuxfoundation.org/pipermail/containers/2012-November/030813. > html > https://lwn.net/Articles/547332/ > > In cgroupsV2 the file does not exist: > > https://man7.org/linux/man-pages/man7/cgroups.7.html > > > In addition, the cgroup.clone_children file that is employed by the cpuset controller has been removed. This is a very old discussion, back to 2012/13, and they finally decided to leave clone_children to cpuset. cgroups v2 doesn't have this option, but it is an entirely new system which Slurm isn't supporting yet, so it doesn't matter here. Thanks for your comments
Matt, I have a patch pending for review. Will let you know when it is ready.
Matt, A fix has been applied to: - 20.02.6 commit 666d2eedebac - 20.11.0pre1 (master) commit cd20c16b169a Plese open a new bug or reopen this one if after these patches you still have issues. Thanks