Ticket 16154 - sinfo shows the installed memory limit as 1
Summary: sinfo shows the installed memory limit as 1
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: - Unsupported Older Versions
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Jason Booth
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-02-28 10:59 MST by Shraddha Kiran
Modified: 2023-03-01 11:29 MST (History)
0 users

See Also:
Site: AMAT
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurm conf (30.63 KB, text/plain)
2023-02-28 12:03 MST, Shraddha Kiran
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Shraddha Kiran 2023-02-28 10:59:33 MST
Hello Support team,

We are using SLURM 19.05.

We are trying to run an application but it fails due to incorrect memory information shown by scheduler as per application logs. When we try to run below command, the memory value is highlighted as 1MB:

e162968@dcalph000:~$ sinfo -o "%n %N %m" -n dcalph001
HOSTNAMES NODELIST MEMORY
dcalph001 dcalph001 1


Is it possible to change this value to amount of RAM installed in the execution host? If so, please assist.

Thank You
Shraddha
Comment 1 Jason Booth 2023-02-28 11:54:48 MST
Please attach your slurm.conf and the output of "slurmd -C" from that compute node, "dcalph001".
Comment 2 Shraddha Kiran 2023-02-28 12:02:16 MST
Hello Jason,

The application ran on below nodes:

e162968@dcalph000:~$ ssh dcalph168
Last login: Tue Feb 28 10:56:41 2023 from master.cm.cluster
sle162968@dcalph168:~$ slurmd -C
NodeName=dcalph168 slurmd: Considering each NUMA node as a socket
CPUs=36 Boards=1 SocketsPerBoard=4 CoresPerSocket=9 ThreadsPerCore=1 RealMemory=385335
UpTime=98-02:52:33
e162968@dcalph168:~$
e162968@dcalph168:~$ exit
logout
Connection to dcalph168 closed.
e162968@dcalph000:~$ ssh dcalph187
Last login: Thu Oct  6 07:15:45 2022 from master.cm.cluster
e162968@dcalph187:~$ slurmd -C
NodeName=dcalph187 slurmd: Considering each NUMA node as a socket
CPUs=36 Boards=1 SocketsPerBoard=4 CoresPerSocket=9 ThreadsPerCore=1 RealMemory=385335
UpTime=264-08:06:05
e162968@dcalph187:~$
e162968@dcalph187:~$ exit
logout
Connection to dcalph187 closed.
e162968@dcalph000:~$ ssh dcalph188
e162968@dcalph188:~$ slurmd -C
NodeName=dcalph188 slurmd: Considering each NUMA node as a socket
CPUs=36 Boards=1 SocketsPerBoard=4 CoresPerSocket=9 ThreadsPerCore=1 RealMemory=385335
UpTime=127-04:57:11
Comment 3 Shraddha Kiran 2023-02-28 12:03:35 MST
Created attachment 29089 [details]
slurm conf
Comment 4 Jason Booth 2023-02-28 14:36:07 MST
So, this is expected behavior when you have no memory defined in the slurm.conf.

> NodeName=dcalph168 CoresPerSocket=18 Feature=6254,384G,nma,rhel7,edr

The slurmd -C will give you hardware config based on what the slurmd sees. 

> NodeName=dcalph168 slurmd: Considering each NUMA node as a socket CPUs=36 Boards=1 SocketsPerBoard=4 CoresPerSocket=9 ThreadsPerCore=1 RealMemory=385335


If you add an entry for the RealMemory then the output will show the correct amount.
Comment 5 Shraddha Kiran 2023-03-01 11:29:24 MST
Thanks for the information!

Shraddha