7841 – jobacct_gather plugin fails to remove cgroups (Device or resource busy)

Ticket 7841 - jobacct_gather plugin fails to remove cgroups (Device or resource busy)

Summary: jobacct_gather plugin fails to remove cgroups (Device or resource busy)

Status:	RESOLVED INVALID

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	slurmstepd (show other tickets)
Version:	19.05.2
Hardware:	Linux Linux

Severity:	6 - No support contract
Assignee:	Jacob Jenson
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2019-10-01 03:16 MDT by Mazen Al-Hagri
Modified:	2024-06-27 21:00 MDT (History)
CC List:	1 user (show)

See Also:
Site:	-Other-
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
slurm.conf (1.62 KB, text/plain) 2019-10-01 03:16 MDT, Mazen Al-Hagri	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Mazen Al-Hagri 2019-10-01 03:16:03 MDT

Created attachment 11755 [details]
slurm.conf

Hi,

Upon MPI job successful completion, slurmstepd fails to clean-up the job cgroups, and consequently (some) nodes get drained.

Logs from slurmctld:

> slurmstepd: error: *** JOB 21 STEPD TERMINATED ON node001 AT 2019-09-20T15:34:50 DUE TO JOB NOT ENDING WITH SIGNALS ***

Debug output of slurmd:

> # Logs from node001:/var/log/slurmd
> [2019-09-20T15:34:29.355] [21.batch] debug3: xcgroup_set_uint32_param: parameter 'tasks' set to '6034' for '/sys/fs/cgroup/cpuacct'
> [2019-09-20T15:34:29.356] [21.batch] debug2: xcgroup_delete: rmdir(/sys/fs/cgroup/cpuacct/slurm/uid_1001/job_21/step_batch/task_0): Device or resource busy
> [2019-09-20T15:34:29.356] [21.batch] debug2: jobacct_gather_cgroup_cpuacct_fini: failed to delete /sys/fs/cgroup/cpuacct/slurm/uid_1001/job_21/step_batch/task_0 Device or resource busy
> [2019-09-20T15:34:29.356] [21.batch] debug2: xcgroup_delete: rmdir(/sys/fs/cgroup/cpuacct/slurm/uid_1001/job_21/step_batch): Device or resource busy
> [2019-09-20T15:34:29.356] [21.batch] debug2: jobacct_gather_cgroup_cpuacct_fini: failed to delete /sys/fs/cgroup/cpuacct Device or resource busy
> [2019-09-20T15:34:29.356] [21.batch] debug2: xcgroup_delete: rmdir(/sys/fs/cgroup/cpuacct/slurm/uid_1001/job_21): Device or resource busy
> [2019-09-20T15:34:29.356] [21.batch] debug2: jobacct_gather_cgroup_cpuacct_fini: failed to delete /sys/fs/cgroup/cpuacct/slurm/uid_1001/job_21 Device or resource busy
> ...

The issue boils down to not being able to remove job's step cgroup upon completion:

> [root@node001 ~]# rmdir /sys/fs/cgroup/freezer/slurm/uid_1001/job_21/step_batch
> rmdir: failed to remove ‘/sys/fs/cgroup/freezer/slurm/uid_1001/job_21/step_batch’: Device or resource busy

However, removing the `task_0` subdir first, then removing the job slice, works:

> [root@node001 ~]# rmdir /sys/fs/cgroup/freezer/slurm/uid_1001/job_21/step_batch/task_0/
> [root@node001 ~]# rmdir /sys/fs/cgroup/freezer/slurm/uid_1001/job_21/step_batch
> [root@node001 ~]# 

I couldn't figure out what's the root cause though. We've seen this issue with MPI jobs only (openmpi-3).

A copy of slurm.conf is attached.

(The issue was reproducible in slurm 18.08.4 and 19.05.2).

Comment 1 Jacob Jenson 2019-10-01 09:16:22 MDT

Mazen,

Is this request for a system that has Slurm support with SchedMD? Or is this more of a question from internal testing? Typically SchedMD only provides support to sites/systems with support contracts. 

Thanks,
Jacob