Ticket 2604

Summary: cgroup not cleaned up
Product: Slurm Reporter: Akmal Madzlan <akmalm>
Component: slurmdAssignee: Alejandro Sanchez <alex>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: alex
Version: 14.11.10   
Hardware: Linux   
OS: Linux   
Site: DownUnder GeoSolutions Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Akmal Madzlan 2016-03-31 14:31:06 MDT
our Slurm doesnt seems to clean up the freezer cgroup.
What did we do wrong?

[akmalm@kud13 ~]$ cat /d/sw/slurm/etc/cgroup.conf 
CgroupAutomount=yes
CgroupMountpoint=/cgroup
CgroupReleaseAgentDir=/.slurm-release-agent/

[akmalm@kud13 ~]$ ll /.slurm-release-agent/
total 4
-rwxr-xr-x 1 root root 3355 Mar 31 17:34 release_common
lrwxrwxrwx 1 root root   14 Jun 11  2015 release_cpuset -> release_common
lrwxrwxrwx 1 root root   14 Jun 11  2015 release_freezer -> release_common

[akmalm@kud13 ~]$ ll /cgroup/freezer/slurm/uid_1419/
total 0
--w--w--w- 1 root root 0 Mar 31 17:02 cgroup.event_control
-rw-r--r-- 1 root root 0 Mar 31 17:02 cgroup.procs
-rw-r--r-- 1 root root 0 Mar 31 17:02 freezer.state
drwxr-xr-x 2 root root 0 Mar 31 17:06 job_533287
drwxr-xr-x 2 root root 0 Mar 31 17:03 job_533288
drwxr-xr-x 2 root root 0 Mar 31 17:03 job_533289
drwxr-xr-x 2 root root 0 Mar 31 17:03 job_533290
drwxr-xr-x 2 root root 0 Mar 31 17:03 job_533291
drwxr-xr-x 2 root root 0 Mar 31 17:03 job_533292
drwxr-xr-x 2 root root 0 Mar 31 17:03 job_533293

We are using the example release_common provided. I did try adding echo something > /tmp/somewhere on top of the release_common but it doesnt seem to be executed
Comment 1 Alejandro Sanchez 2016-03-31 21:32:49 MDT
Akmal, could you try as root:

> echo "/path/to/your/release_freezer" > /cgroup/freezer/release_agent

Then submit a job and when it finishes verify if the hierarchy is cleaned up under /cgroup/freezer?
Comment 2 Akmal Madzlan 2016-04-03 17:30:34 MDT
Hi Alejandro,
When I put the path in /cgroup/freezer/release_agent, the cgroup is cleaned.

Akmal
Comment 3 Akmal Madzlan 2016-04-03 17:50:04 MDT
Hi Alejandro,
Why is this happening? Is it caused by a configuration error?

Akmal
Comment 4 Alejandro Sanchez 2016-04-04 04:43:17 MDT
I'm investigating why there's a need for manually setting up the path in /cgroup/freezer/release_agent in 14.11. Just tested in 15.08 and there's no need to do that, hierarchy is cleaned up with just the Slurm config.
Comment 5 Alejandro Sanchez 2016-04-04 04:56:56 MDT
I think this was fixed in the following commit:

https://github.com/SchedMD/slurm/commit/c2ce30c2c1879da4d3b02622ca82819163c90d30

So if you are not planning to upgrade to 15.08, you can use the workaround I suggested in comment #1. Please, let me know if you have any more questions.
Comment 6 Alejandro Sanchez 2016-04-12 21:51:37 MDT
Hi Akmal. Can we close this bug? Do you have any more questions? Thanks.
Comment 7 Akmal Madzlan 2016-04-13 16:11:49 MDT
Hi Alejandro, yes we can close this bug.

Thanks