Ticket 7841 - jobacct_gather plugin fails to remove cgroups (Device or resource busy)
Summary: jobacct_gather plugin fails to remove cgroups (Device or resource busy)
Status: RESOLVED INVALID
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmstepd (show other tickets)
Version: 19.05.2
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-10-01 03:16 MDT by Mazen Al-Hagri
Modified: 2024-06-27 21:00 MDT (History)
1 user (show)

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurm.conf (1.62 KB, text/plain)
2019-10-01 03:16 MDT, Mazen Al-Hagri
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Mazen Al-Hagri 2019-10-01 03:16:03 MDT
Created attachment 11755 [details]
slurm.conf

Hi,

Upon MPI job successful completion, slurmstepd fails to clean-up the job cgroups, and consequently (some) nodes get drained.

Logs from slurmctld:

> slurmstepd: error: *** JOB 21 STEPD TERMINATED ON node001 AT 2019-09-20T15:34:50 DUE TO JOB NOT ENDING WITH SIGNALS ***

Debug output of slurmd:

> # Logs from node001:/var/log/slurmd
> [2019-09-20T15:34:29.355] [21.batch] debug3: xcgroup_set_uint32_param: parameter 'tasks' set to '6034' for '/sys/fs/cgroup/cpuacct'
> [2019-09-20T15:34:29.356] [21.batch] debug2: xcgroup_delete: rmdir(/sys/fs/cgroup/cpuacct/slurm/uid_1001/job_21/step_batch/task_0): Device or resource busy
> [2019-09-20T15:34:29.356] [21.batch] debug2: jobacct_gather_cgroup_cpuacct_fini: failed to delete /sys/fs/cgroup/cpuacct/slurm/uid_1001/job_21/step_batch/task_0 Device or resource busy
> [2019-09-20T15:34:29.356] [21.batch] debug2: xcgroup_delete: rmdir(/sys/fs/cgroup/cpuacct/slurm/uid_1001/job_21/step_batch): Device or resource busy
> [2019-09-20T15:34:29.356] [21.batch] debug2: jobacct_gather_cgroup_cpuacct_fini: failed to delete /sys/fs/cgroup/cpuacct Device or resource busy
> [2019-09-20T15:34:29.356] [21.batch] debug2: xcgroup_delete: rmdir(/sys/fs/cgroup/cpuacct/slurm/uid_1001/job_21): Device or resource busy
> [2019-09-20T15:34:29.356] [21.batch] debug2: jobacct_gather_cgroup_cpuacct_fini: failed to delete /sys/fs/cgroup/cpuacct/slurm/uid_1001/job_21 Device or resource busy
> ...

The issue boils down to not being able to remove job's step cgroup upon completion:

> [root@node001 ~]# rmdir /sys/fs/cgroup/freezer/slurm/uid_1001/job_21/step_batch
> rmdir: failed to remove ‘/sys/fs/cgroup/freezer/slurm/uid_1001/job_21/step_batch’: Device or resource busy

However, removing the `task_0` subdir first, then removing the job slice, works:

> [root@node001 ~]# rmdir /sys/fs/cgroup/freezer/slurm/uid_1001/job_21/step_batch/task_0/
> [root@node001 ~]# rmdir /sys/fs/cgroup/freezer/slurm/uid_1001/job_21/step_batch
> [root@node001 ~]# 

I couldn't figure out what's the root cause though. We've seen this issue with MPI jobs only (openmpi-3).

A copy of slurm.conf is attached.

(The issue was reproducible in slurm 18.08.4 and 19.05.2).
Comment 1 Jacob Jenson 2019-10-01 09:16:22 MDT
Mazen,

Is this request for a system that has Slurm support with SchedMD? Or is this more of a question from internal testing? Typically SchedMD only provides support to sites/systems with support contracts. 

Thanks,
Jacob