Ticket 13243

Summary:	slurmctld too many open files
Product:	Slurm	Reporter:	Matt Ezell <ezellma>
Component:	slurmctld	Assignee:	Nate Rini <nate>
Status:	RESOLVED FIXED	QA Contact:
Severity:	3 - Medium Impact
Priority:	---	CC:	alex, brian.gilmer, nate, vergaravg
Version:	21.08.5
Hardware:	Linux
OS:	Linux
See Also:	https://bugs.schedmd.com/show_bug.cgi?id=12804
Site:	ORNL-OLCF	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:	21.08.6,22.05pre1
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Matt Ezell 2022-01-22 09:53:00 MST

After a cluster restart, slurmctld is logging a LOT of:
[2022-01-22T11:46:24.864] error: slurm_accept_msg_conn: Too many open files
[2022-01-22T11:46:24.864] error: slurm_accept_msg_conn: Too many open files

# grep -c "Too many open files" /var/log/slurmctld.log 
159083467

This log file is just fron today.

Similar to the issue in 12804 (for slurmd) it seems that 4096 open files is not enough for our slurmctld.

# egrep -i Units\|files /proc/$(pgrep slurmctld)/limits
Limit                     Soft Limit           Hard Limit           Units     
Max open files            4096                 524288               files 
# systemctl show slurmctld | grep LimitNOFILE
LimitNOFILE=524288
LimitNOFILESoft=524288

Comment 1 Nate Rini 2022-01-24 09:46:56 MST

Please also provide /proc/../status for slurmctld.

Comment 2 Nate Rini 2022-01-25 09:20:16 MST

(In reply to Nate Rini from comment #1)
> Please also provide /proc/../status for slurmctld.

if you prefer copy and paste:
> cat /proc/$(pgrep slurmctld)/status

Comment 3 Matt Ezell 2022-01-25 09:27:08 MST

(In reply to Nate Rini from comment #1)
> Please also provide /proc/../status for slurmctld.

We just restarted the controller (not the compute nodes) and see this message.

[root@slurm1.frontier ~]# cat /proc/$(pgrep slurmctld)/status
Name:   slurmctld
Umask:  0022
State:  S (sleeping)
Tgid:   16354
Ngid:   0
Pid:    16354
PPid:   1
TracerPid:      0
Uid:    6826    6826    6826    6826
Gid:    9526    9526    9526    9526
FDSize: 4096
Groups: 2046 2075 2324 9526 22738 24121 27480 27493 
NStgid: 16354
NSpid:  16354
NSpgid: 16354
NSsid:  16354
VmPeak: 29258768 kB
VmSize: 17084096 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    296176 kB
VmRSS:    231660 kB
RssAnon:          223572 kB
RssFile:               4 kB
RssShmem:           8084 kB
VmData:   292440 kB
VmStk:       132 kB
VmExe:      1044 kB
VmLib:      6564 kB
VmPTE:      1860 kB
VmSwap:        0 kB
HugetlbPages:          0 kB
HugetlbResvPages:              0 kB
CoreDumping:    0
THP_enabled:    1
Threads:        18
SigQ:   1/1024711
SigPnd: 0000000000000000
ShdPnd: 0000000000010000
SigBlk: 0000000000897827
SigIgn: 0000000000001000
SigCgt: 0000000180000200
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 000000ffffffffff
CapAmb: 0000000000000000
NoNewPrivs:     0
Seccomp:        0
Speculation_Store_Bypass:       thread vulnerable
Cpus_allowed:   ffffffff
Cpus_allowed_list:      0-31
Mems_allowed:   00000000,00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        21446
nonvoluntary_ctxt_switches:     36

Comment 4 Nate Rini 2022-01-25 09:42:00 MST

(In reply to Matt Ezell from comment #3)
> (In reply to Nate Rini from comment #1)
> > Please also provide /proc/../status for slurmctld.
> [root@slurm1.frontier ~]# cat /proc/$(pgrep slurmctld)/status

Please also call:
> ls -la /proc/$(pgrep slurmctld)/fd

Comment 5 Matt Ezell 2022-01-25 09:47:14 MST

(In reply to Nate Rini from comment #4)
> Please also call:
> > ls -la /proc/$(pgrep slurmctld)/fd

Sometimes we don't see many FDs:

[root@slurm1.frontier ~]# ls -la /proc/$(pgrep slurmctld)/fd | wc -l
18

And sometimes we do:
[root@slurm1.frontier ~]# ls -la /proc/$(pgrep slurmctld)/fd | wc -l
4099

When the number is high, the FDs are mostly sockets:
[root@slurm1.frontier ~]# ls -la /proc/$(pgrep slurmctld)/fd | tail
lrwx------ 1 slurm slurm 64 Jan 25 11:33 990 -> socket:[17376908]
lrwx------ 1 slurm slurm 64 Jan 25 11:33 991 -> socket:[20265040]
lrwx------ 1 slurm slurm 64 Jan 25 11:33 992 -> socket:[19701217]
lrwx------ 1 slurm slurm 64 Jan 25 11:33 993 -> socket:[19572051]
lrwx------ 1 slurm slurm 64 Jan 25 11:33 994 -> socket:[20267092]
lrwx------ 1 slurm slurm 64 Jan 25 11:33 995 -> socket:[19703838]
lrwx------ 1 slurm slurm 64 Jan 25 11:33 996 -> socket:[19570961]
lrwx------ 1 slurm slurm 64 Jan 25 11:33 997 -> socket:[19698923]
lrwx------ 1 slurm slurm 64 Jan 25 11:33 998 -> socket:[17376909]
lrwx------ 1 slurm slurm 64 Jan 25 11:33 999 -> socket:[19698924]

Comment 6 Nate Rini 2022-01-25 09:54:21 MST

(In reply to Matt Ezell from comment #5)
> When the number is high, the FDs are mostly sockets:
Okay, so nothing unexpected here.

> [root@slurm1.frontier ~]# ls -la /proc/$(pgrep slurmctld)/fd | wc -l
> 4099
Looking for that code that sets the soft limit in slurmctld.

Comment 7 Nate Rini 2022-01-25 10:50:14 MST

Please try this patch:

> diff --git a/src/slurmctld/controller.c b/src/slurmctld/controller.c
> index 5e15935..36aac2b 100644
> --- a/src/slurmctld/controller.c
> +++ b/src/slurmctld/controller.c
> @@ -948,7 +948,7 @@ static void  _init_config(void)
>  {
>       struct rlimit rlim;
>  
> -     rlimits_adjust_nofile();
> +     rlimits_use_max_nofile();
>       if (getrlimit(RLIMIT_CORE, &rlim) == 0) {
>               rlim.rlim_cur = rlim.rlim_max;
>               (void) setrlimit(RLIMIT_CORE, &rlim);

Comment 11 Matt Ezell 2022-01-26 20:21:21 MST

(In reply to Nate Rini from comment #7)
> Please try this patch:

For various reasons I've been unable to try this yet, but I'm pretty confident it would fix the issue.

I think the original reason open files were limited was due to slowness in closeall() after a fork when there are many possible fds. Hopefully with more functionality moving to slurmscriptd forking of slurmctld is not as common.

Comment 13 Nate Rini 2022-02-04 10:30:10 MST

Matt,

This fix is now upstream:
> https://github.com/SchedMD/slurm/commit/82f417450686b71f84b088c9d8e811237ca3336d

Closing ticket. Please respond if there are any more related issues.

Thanks,
--Nate