For the output below job 107 has finished (and should have no cgroups), job 108 is running. While a job is running we see cgroups for memory,cpuset,devices,freezer but after a job finish it only cleans up memory and freezer cgroups. It is unclear to me how to debug this or configure it to clean up all cgroups after a job finishes. [root@c13n08.farnam ~]# lscgroup cpu,cpuacct:/ net_cls:/ memory:/ memory:/slurm memory:/slurm/uid_10374 memory:/slurm/uid_10374/job_108 memory:/slurm/uid_10374/job_108/step_batch cpuset:/ cpuset:/slurm cpuset:/slurm/uid_10374 cpuset:/slurm/uid_10374/job_108 cpuset:/slurm/uid_10374/job_108/step_batch cpuset:/slurm/uid_10374/job_107 cpuset:/slurm/uid_10374/job_107/step_batch perf_event:/ blkio:/ hugetlb:/ devices:/ devices:/slurm devices:/slurm/uid_10374 devices:/slurm/uid_10374/job_108 devices:/slurm/uid_10374/job_108/step_batch devices:/slurm/uid_10374/job_107 devices:/slurm/uid_10374/job_107/step_batch freezer:/ freezer:/slurm freezer:/slurm/uid_10374 freezer:/slurm/uid_10374/job_108 freezer:/slurm/uid_10374/job_108/step_batch [root@c13n08.farnam cgroup]# ls -l /etc/slurm/cgroup total 4 -rwxr-xr-x 1 slurm slurm 3307 Sep 20 09:35 release_common lrwxrwxrwx 1 slurm slurm 14 Sep 20 10:07 release_cpuset -> release_common lrwxrwxrwx 1 slurm slurm 14 Sep 20 10:08 release_devices -> release_common lrwxrwxrwx 1 slurm slurm 14 Sep 20 10:08 release_freezer -> release_common lrwxrwxrwx 1 slurm slurm 14 Sep 20 10:08 release_memory -> release_common [root@c13n08.farnam cgroup]# cat /etc/slurm/cgroup.conf # update this to where your release agents are installed: CgroupReleaseAgentDir="/etc/slurm/cgroup" CgroupAutomount=yes ConstrainCores=yes ConstrainDevices=yes ConstrainRAMSpace=yes ConstrainSwapSpace=yes TaskAffinity=yes
What OS is installed on the node? I'm guessing RHEL7 or some other systemd-based distribution? systemd tends to remove the release_agent setting on the cgroup mount, which then leads to these stray cgroup directoris. They generally shouldn't cause any problems (up until the node has processed thousands of jobs), and the 17.02 release has some additional work that should remove the need for the release_agent setting entirely to avoid this conflict. (The 16.05 release already has similar work done to remove the release_agent requirement for memory and freezer.) In the meantime, can you check the output from 'cat /proc/mounts' and see what options are set for the cgroup directory? If the release_agent line isn't there then that would explain the problem.
[root@c13n08.farnam cgroup]# cat /etc/redhat-release ; mount -t cgroup Red Hat Enterprise Linux Server release 7.2 (Maipo) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu) cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) [root@c13n08.farnam cgroup]#
The mount options on memory and cpuset are the same. One cleans up and the other doesn't. I'm not sure how to change the default cgroup mount options or what to change them to. Can you test rhel7 and provide instructions? Thanks.
(In reply to Charles Wright from comment #3) > The mount options on memory and cpuset are the same. One cleans up and the > other doesn't. I'm not sure how to change the default cgroup mount options > or what to change them to. Can you test rhel7 and provide instructions? > > Thanks. Your mount command showed that slurmctld's release_agent option had been removed in favor of systemd's: > cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) This has been a recurring problem on RHEL7, but we haven't been able to isolate the exact cause - it appears something within the systemd code eventually re-mounts the cgroups and removes Slurm's release_agent option. Why you only see this on some of the hierarchies is that the memory and freezer already had code in place to handle cleanup within slurmctld, and do not rely on the release_agent. The cpuset and device hierarchies were both missing this code. We'd previously committed a patch to the master branch that handles this, but it's become apparent that RHEL7 could really use that code now, and that we shouldn't wait until the next release to get this fix out there. Commit 66beca68217 pulls in that extended cleanup logic from the master branch, and will be included with the 16.05.5 which we expect to release shortly. (You can apply this patch in the meantime if you'd like, although as I've mentioned the orphaned directories shouldn't cause any issues until there are a significant number of them.) With 16.05.5 and on, you should no longer need the ReleaseAgent setting within cgroup.conf at all; slurmctld should then handle cleaning up all the cgroup directories properly internally. I'll be revising the documentation to mention that the ReleaseAgent setting is no longer required. Just to summarize: once 16.05.5 is released please update, and remove the ReleaseAgent setting from cgroup.conf .