Ticket 2604 - cgroup not cleaned up
Summary: cgroup not cleaned up
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmd (show other tickets)
Version: 14.11.10
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Alejandro Sanchez
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2016-03-31 14:31 MDT by Akmal Madzlan
Modified: 2016-04-13 16:11 MDT (History)
1 user (show)

See Also:
Site: DownUnder GeoSolutions
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Akmal Madzlan 2016-03-31 14:31:06 MDT
our Slurm doesnt seems to clean up the freezer cgroup.
What did we do wrong?

[akmalm@kud13 ~]$ cat /d/sw/slurm/etc/cgroup.conf 
CgroupAutomount=yes
CgroupMountpoint=/cgroup
CgroupReleaseAgentDir=/.slurm-release-agent/

[akmalm@kud13 ~]$ ll /.slurm-release-agent/
total 4
-rwxr-xr-x 1 root root 3355 Mar 31 17:34 release_common
lrwxrwxrwx 1 root root   14 Jun 11  2015 release_cpuset -> release_common
lrwxrwxrwx 1 root root   14 Jun 11  2015 release_freezer -> release_common

[akmalm@kud13 ~]$ ll /cgroup/freezer/slurm/uid_1419/
total 0
--w--w--w- 1 root root 0 Mar 31 17:02 cgroup.event_control
-rw-r--r-- 1 root root 0 Mar 31 17:02 cgroup.procs
-rw-r--r-- 1 root root 0 Mar 31 17:02 freezer.state
drwxr-xr-x 2 root root 0 Mar 31 17:06 job_533287
drwxr-xr-x 2 root root 0 Mar 31 17:03 job_533288
drwxr-xr-x 2 root root 0 Mar 31 17:03 job_533289
drwxr-xr-x 2 root root 0 Mar 31 17:03 job_533290
drwxr-xr-x 2 root root 0 Mar 31 17:03 job_533291
drwxr-xr-x 2 root root 0 Mar 31 17:03 job_533292
drwxr-xr-x 2 root root 0 Mar 31 17:03 job_533293

We are using the example release_common provided. I did try adding echo something > /tmp/somewhere on top of the release_common but it doesnt seem to be executed
Comment 1 Alejandro Sanchez 2016-03-31 21:32:49 MDT
Akmal, could you try as root:

> echo "/path/to/your/release_freezer" > /cgroup/freezer/release_agent

Then submit a job and when it finishes verify if the hierarchy is cleaned up under /cgroup/freezer?
Comment 2 Akmal Madzlan 2016-04-03 17:30:34 MDT
Hi Alejandro,
When I put the path in /cgroup/freezer/release_agent, the cgroup is cleaned.

Akmal
Comment 3 Akmal Madzlan 2016-04-03 17:50:04 MDT
Hi Alejandro,
Why is this happening? Is it caused by a configuration error?

Akmal
Comment 4 Alejandro Sanchez 2016-04-04 04:43:17 MDT
I'm investigating why there's a need for manually setting up the path in /cgroup/freezer/release_agent in 14.11. Just tested in 15.08 and there's no need to do that, hierarchy is cleaned up with just the Slurm config.
Comment 5 Alejandro Sanchez 2016-04-04 04:56:56 MDT
I think this was fixed in the following commit:

https://github.com/SchedMD/slurm/commit/c2ce30c2c1879da4d3b02622ca82819163c90d30

So if you are not planning to upgrade to 15.08, you can use the workaround I suggested in comment #1. Please, let me know if you have any more questions.
Comment 6 Alejandro Sanchez 2016-04-12 21:51:37 MDT
Hi Akmal. Can we close this bug? Do you have any more questions? Thanks.
Comment 7 Akmal Madzlan 2016-04-13 16:11:49 MDT
Hi Alejandro, yes we can close this bug.

Thanks