Hello Support team, We are using SLURM 19.05. We are trying to run an application but it fails due to incorrect memory information shown by scheduler as per application logs. When we try to run below command, the memory value is highlighted as 1MB: e162968@dcalph000:~$ sinfo -o "%n %N %m" -n dcalph001 HOSTNAMES NODELIST MEMORY dcalph001 dcalph001 1 Is it possible to change this value to amount of RAM installed in the execution host? If so, please assist. Thank You Shraddha
Please attach your slurm.conf and the output of "slurmd -C" from that compute node, "dcalph001".
Hello Jason, The application ran on below nodes: e162968@dcalph000:~$ ssh dcalph168 Last login: Tue Feb 28 10:56:41 2023 from master.cm.cluster sle162968@dcalph168:~$ slurmd -C NodeName=dcalph168 slurmd: Considering each NUMA node as a socket CPUs=36 Boards=1 SocketsPerBoard=4 CoresPerSocket=9 ThreadsPerCore=1 RealMemory=385335 UpTime=98-02:52:33 e162968@dcalph168:~$ e162968@dcalph168:~$ exit logout Connection to dcalph168 closed. e162968@dcalph000:~$ ssh dcalph187 Last login: Thu Oct 6 07:15:45 2022 from master.cm.cluster e162968@dcalph187:~$ slurmd -C NodeName=dcalph187 slurmd: Considering each NUMA node as a socket CPUs=36 Boards=1 SocketsPerBoard=4 CoresPerSocket=9 ThreadsPerCore=1 RealMemory=385335 UpTime=264-08:06:05 e162968@dcalph187:~$ e162968@dcalph187:~$ exit logout Connection to dcalph187 closed. e162968@dcalph000:~$ ssh dcalph188 e162968@dcalph188:~$ slurmd -C NodeName=dcalph188 slurmd: Considering each NUMA node as a socket CPUs=36 Boards=1 SocketsPerBoard=4 CoresPerSocket=9 ThreadsPerCore=1 RealMemory=385335 UpTime=127-04:57:11
Created attachment 29089 [details] slurm conf
So, this is expected behavior when you have no memory defined in the slurm.conf. > NodeName=dcalph168 CoresPerSocket=18 Feature=6254,384G,nma,rhel7,edr The slurmd -C will give you hardware config based on what the slurmd sees. > NodeName=dcalph168 slurmd: Considering each NUMA node as a socket CPUs=36 Boards=1 SocketsPerBoard=4 CoresPerSocket=9 ThreadsPerCore=1 RealMemory=385335 If you add an entry for the RealMemory then the output will show the correct amount.
Thanks for the information! Shraddha