22449 – /sys/fs/cgroup/system.slice/slurmstepd.scope

Ticket 22449 - /sys/fs/cgroup/system.slice/slurmstepd.scope

Summary: /sys/fs/cgroup/system.slice/slurmstepd.scope

Status:	OPEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	slurmstepd (show other tickets)
Version:	24.11.3
Hardware:	Linux Linux

Severity:	6 - No support contract
Assignee:	Jacob Jenson
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2025-03-27 09:52 MDT by Pablo Flores
Modified:	2025-03-27 09:52 MDT (History)
CC List:	0 users

See Also:
Site:	-Other-
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Pablo Flores 2025-03-27 09:52:49 MDT

cgroup is not completely cleaning up the files of finished jobs.

In the directory /sys/fs/cgroup/system.slice/slurmstepd.scope, records of jobs that have already finished can be observed.


#ls
job_45304707  job_46043148  job_46236561  job_46540674  job_46548677  job_46553239  job_46705405  
cgroup.events                    job_45307219  job_46043404  job_46239503  job_46540731  job_46548694  job_46553431  job_46705417  
cgroup.freeze                    job_45307228  job_46044131  job_46339702  job_46541132  job_46548988  job_46553474  job_46761341  
cgroup.kill                      job_45307237  job_46046151  job_46390017  job_46541170  job_46549130  job_46554100  job_46761501  
cgroup.max.depth                 job_45311310  job_46075084  job_46391601  job_46541227  job_46549274  job_46564361  job_46830705  

These records can also be observed by running the following command (the output is not complete):

[root@sn013 slurmstepd.scope]# systemd-cgtop | grep slurmstep
system.slice/slurmstepd.scope                                            146      -    36.6G        -        -  
system.slice/slurmstepd.scope/job_45290719                                 -      -   468.0K        -        -  
system.slice/slurmstepd.scope/job_45299168                                 -      -   236.0K        -        -  
system.slice/slurmstepd.scope/job_45304402                                 -      -   460.0K        -        -  
system.slice/slurmstepd.scope/job_45304696                                 -      -   460.0K        -        -  
system.slice/slurmstepd.scope/job_45304707                                 -      -   460.0K        -        -  
system.slice/slurmstepd.scope/job_45307219                                 -      -   460.0K        -        -  
system.slice/slurmstepd.scope/job_45307228                                 -      -   468.0K        -        -  
system.slice/slurmstepd.scope/job_45307237                                 -      -   460.0K        -        -  


The processes of the finished jobs are no longer running. To verify this, we ran the following command:

[root@sn013 slurmstepd.scope]# ps aux | grep slurmstep
root        1800  0.0  0.0   6868  2816 ?        S    Feb26   0:00 /usr/sbin/slurmstepd infinity  
root     1899750  0.0  0.0 617208  7392 ?        Sl   10:41   0:00 slurmstepd: [46918979.extern]  
root     1899768  0.0  0.0 945604  7744 ?        Sl   10:41   0:00 slurmstepd: [46918979.batch]  
root     1900888  0.0  0.0 1163752 15632 ?       Sl   10:41   0:03 slurmstepd: [46918979.0]  
root     2048685  0.0  0.0   6412  2112 pts/0    S+   12:31   0:00 grep --color=auto slurmstep  


[root@sn013 slurmstepd.scope]#  
However, cgroup v2 is unable to clean up these records, and this list can continue to grow over time, consuming RAM, as observed in systemd-cgtop.

The only way we found to remove them is by stopping the slurmstepd daemon, but this would cancel all tasks on the node.

Any hints on how to solve this?

[root@sn013 slurmstepd.scope]# systemd-cgtop | grep slurmstep | wc -l
219