Hi Experts, We observed that the cgroup devices directories are removed for jobs running longer than one day generally, but not for all these long-running jobs. /sys/fs/cgroup/devices/slurm/uid_*/job_* Our cgroup.conf is as below: $ cat /opt/slurm/etc/cgroup.conf CgroupAutomount=yes CgroupReleaseAgentDir="/opt/slurm/etc/cgroup" ConstrainCores=yes ConstrainKmemSpace=no ConstrainRAMSpace=yes ConstrainSwapSpace=yes MaxSwapPercent=0 ConstrainDevices=yes TaskAffinity=yes This directory is empty $ ls -l /opt/slurm/etc/cgroup total 0 AFAIK, we don't have other process to clean up /sys/fs/cgroup/devices/*. Do you have any idea where the deletion might be initiated? Thanks!
Starting in 17.02.0 you no longer need CgroupReleaseAgentDir in your cgroup.conf that should be removed. Slurm should cleanup automatically the cgroups as needed. That being said I know there were issues with cleanup that were fixed in 17.02.3. Would it be possible for you to upgrade to at least that version (if not the latest 17.02.6) and see if that fixes your issues? If this doesn't help Alex will help you further.
Hi. Did you manage to upgrade to the latest 17.02 and test the cgroup automatic hierarchy removal? is there anything else that you need from us? Thanks.
We have been running on 17.02.1. However I do see that part of source code is difference from 17.02.6. We need to schedule a download to upgrade Slurm which will be after the Summer. Thank you!
All right. The commit that Danny talks about in comment 1 is this: https://github.com/SchedMD/slurm/commit/24e2cb07e8e363f24dda036637be97f90507fcd6 I'm closing this as resolved/infogiven. Please reopen if you encounter further issues. Thanks.