Ticket 13407 - RAM limitation in Slurm jobs
Summary: RAM limitation in Slurm jobs
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Limits (show other tickets)
Version: 20.11.8
Hardware: Linux Linux
: 2 - High Impact
Assignee: Carlos Tripiana Montes
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-02-10 13:45 MST by Matt Morgan
Modified: 2022-02-17 02:14 MST (History)
2 users (show)

See Also:
Site: Miami University Oxford Ohio
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurmdlog (118.92 KB, text/plain)
2022-02-14 12:13 MST, Matt Morgan
Details
slurmconf (9.34 KB, text/plain)
2022-02-14 12:13 MST, Matt Morgan
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Matt Morgan 2022-02-10 13:45:46 MST
In a batch job I'm setting the memory and monitor the limit on the node with ulimit:

#!/bin/bash
# to be submitted by: sbatch slurm_job.txt
#SBATCH --time=1:00:00
#SBATCH --nodes=1 --ntasks-per-node=24
#SBATCH --job-name=hello
#SBATCH --partition=batch
#SBATCH --mem=60GB

cd $SLURM_SUBMIT_DIR
ulimit -a
module load anaconda-python3
python py_mem.py

The job's output correctly reports the requested limit, however the program crashes when it tries to allocate more than about 26GB of RAM. It's not specific to Python, other programs like Matlab crash in a similar fashion at the same barrier when trying to allocate memory. Interactive jobs with salloc show a similar behavior.

This is the program:
[muellej@mualhplp01:Slurm_transition2021] $ more py_mem.py
import numpy as np
import contextlib
#requires at least 32 GB
with contextlib.redirect_stdout(None):mya=np.random.rand(65000,65000)


[muellej@mualhplp01:Slurm_transition2021] $ more slurm-1141.out
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 380029
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) 62914560
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) 30000000
file locks                      (-x) unlimited
Please execute: source /software/python/anaconda3/etc/profile.d/conda.sh
Traceback (most recent call last):
  File "py_mem.py", line 4, in <module>
    with contextlib.redirect_stdout(None):mya=np.random.rand(65000,65000)
  File "mtrand.pyx", line 1154, in numpy.random.mtrand.RandomState.rand
  File "mtrand.pyx", line 420, in numpy.random.mtrand.RandomState.random_sample
  File "_common.pyx", line 256, in numpy.random._common.double_fill
MemoryError: Unable to allocate 31.5 GiB for an array with shape (65000, 65000) and data type float64
Comment 1 Carlos Tripiana Montes 2022-02-11 01:40:03 MST
Hi Matt,

Would you mind to attach your slurm.conf file and the slurmctld.conf plus slurmd.conf from the node running this job?

We need to set the right framework up to properly address your issue.

Additionally, take a look at the systems logs around the time of the job and see if there's any OOM killer going around.

Thanks,
Carlos.
Comment 2 Matt Morgan 2022-02-14 12:13:06 MST
Created attachment 23472 [details]
slurmdlog
Comment 3 Matt Morgan 2022-02-14 12:13:25 MST
Created attachment 23473 [details]
slurmconf
Comment 4 Matt Morgan 2022-02-14 12:15:52 MST
Hey Carlos,

Not seeing anything in the logs concerning OOM but I have attached the files requested, if you need more info on the jobs that were run I can provide that as well, thank you for your time and understanding

-Matt
Comment 5 Carlos Tripiana Montes 2022-02-15 01:13:15 MST
Hi Matt,

I think you need to check _why_ the _total_ amount of addressable space (AKA virtual memory), is lower than data size (AKA maximum memory size):

max memory size         (kbytes, -m) 62914560
virtual memory          (kbytes, -v) 30000000

If you look at virtual memory, 30000000KiB is around 28.61GiB. This is the _fuzzy value_ of "about 26GB of RAM" you stated in Comment 0.

As per slurm.conf provided, you are propagating ALL, which implies AS, DATA, and STACK [1][2][3]. DATA is lately restricted by "--mem=60GB" parameter. This is the memory space for dynamic allocation. AS implies all the usable memory addresses, including stack, heap, contextes, _everything_ [4]. If you subtract all this "overhead" from 28.61GiB, that's why you can't go higher than ~26GiB.

Again, because you are using PropagateResourceLimits=ALL without any "Except" [5], I think you are inheriting AS, DATA, and STACK from the login nodes _or similar_, where the users have a restricted amount of resources. I'd suggest you to set the virtual memory to unlimited if you already have the max memory size limited there. This way should suffice to prevent the users from eating memory and, because Slurm uses DefMemPerCPU, "--mem", --exclusive, etc. to set the amount of memory for a job, I think it should be enough.

Have a look at this and play a bit with the config. I think is is the problem you're facing.

Cheers,
Carlos.

[1] https://slurm.schedmd.com/slurm.conf.html#OPT_AS
[2] https://slurm.schedmd.com/slurm.conf.html#OPT_DATA
[3] https://slurm.schedmd.com/slurm.conf.html#OPT_STACK
[4] https://pubs.opengroup.org/onlinepubs/9699919799/functions/setrlimit.html
[5] https://slurm.schedmd.com/slurm.conf.html#OPT_PropagateResourceLimitsExcept
Comment 6 Carlos Tripiana Montes 2022-02-17 02:14:12 MST
Hi Matt,

Even though this is a Sev-2, I'm going to close the issue as info given by now.

I think I've spotted your issue right, because I have no further urgent communications from your side after my reply.

If this is not the case, please feel free to reopen the bug.

Cheers,
Carlos.