|
Description
David Gloe
2021-11-02 10:32:07 MDT
Please call as root: > /proc/sys/fs/file-max > ulimit -a -H > ulimit -a -S Please call this as the test user: > ulimit -a -H > ulimit -a -S As root on a compute node: # cat /proc/sys/fs/file-max 9223372036854775807 # ulimit -a -H core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 2056733 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 16384 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 2056733 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited # ulimit -a -S core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 2056733 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 16384 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 300000 cpu time (seconds, -t) unlimited max user processes (-u) 2056733 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited As a regular user on a login node: # ulimit -a -H core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 1028069 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 524288 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 1028069 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited # ulimit -a -S core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 1028069 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 524288 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 300000 cpu time (seconds, -t) unlimited max user processes (-u) 1028069 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited As a regular user on the login node srun'ing to a compute: # srun -N1 -A STF002 /bin/bash -c 'ulimit -a -H' core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 2056733 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 524288 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 2056733 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited # srun -N1 -A STF002 /bin/bash -c 'ulimit -a -S' core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 2056733 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 524288 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 300000 cpu time (seconds, -t) unlimited max user processes (-u) 1028069 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited If I srun a sleep command and go look at slurmstepd on the node: # cat /proc/62078/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size unlimited unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 2056733 2056733 processes Max open files 4096 524288 files Max locked memory unlimited unlimited bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 2056733 2056733 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us I suspect it's slurmstepd trying to accept stdio/stderr connections that is hitting this problem Please try this patch to srun:
> diff --git a/src/common/slurm_rlimits_info.c b/src/common/slurm_rlimits_info.c
> index aff433b..e37e295 100644
> --- a/src/common/slurm_rlimits_info.c
> +++ b/src/common/slurm_rlimits_info.c
> @@ -206,7 +206,7 @@ extern void rlimits_adjust_nofile(void)
> if (getrlimit(RLIMIT_NOFILE, &rlim) < 0)
> error("getrlimit(RLIMIT_NOFILE): %m");
>
> - rlim.rlim_cur = MIN(4096, rlim.rlim_max);
> + rlim.rlim_cur = MAX(4096, rlim.rlim_max);
>
> if (setrlimit(RLIMIT_NOFILE, &rlim) < 0)
> error("Unable to adjust maximum number of open files: %m");
I built 21.08.2 with a one-line patch to change 4096 to 16384. We will test with this to see if it resolves the issue. (In reply to Matt Ezell from comment #6) > I built 21.08.2 with a one-line patch to change 4096 to 16384. We will test > with this to see if it resolves the issue. This has allowed is to run a higher node counts. (In reply to Matt Ezell from comment #8) > (In reply to Matt Ezell from comment #6) > > I built 21.08.2 with a one-line patch to change 4096 to 16384. We will test > > with this to see if it resolves the issue. > > This has allowed is to run a higher node counts. Okay, this limit was set due to other limits. I will have to look into what the secondary consequences for this. Created attachment 22104 [details]
patch for 21.08 (v2)
(In reply to Nate Rini from comment #10) > Created attachment 22104 [details] > patch for 21.08 (v2) Matt et al. Please try this patch. It removes the arbitrary limit on RLIMIT_NOFILE by being more efficient for the smaller jobs. In the case of large jobs, I doubt it will hurt performance noticeably. This patch has not been QA tested, so please only try it on your test system. Thanks, --Nate what version of glibc is installed on this cluster? (In reply to Nate Rini from comment #13) > what version of glibc is installed on this cluster? glibc-2.31-7.30.x86_64 Created attachment 22116 [details]
patch for 21.08 (v3)
Benchmarking shows that using procfs is actually slower than blinding calling close() on every possible file descriptor. The kernel devs add the new syscall close_range() but it only available with glibc 2.34+.
This patch instead removes the limit on srun. Please give it a try.
David, Matt,
This is now fixed upstream for slurm-21.08.5:
> https://github.com/SchedMD/slurm/commit/d2c1a05e15de6019c1e2def91e77a0377cd1a446
Closing out the ticket. Please respond if there are any more related issues.
Thanks,
--Nate
|