| Summary: | sinfo shows the installed memory limit as 1 | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Shraddha Kiran <Shraddha_Kiran> |
| Component: | Configuration | Assignee: | Jason Booth <jbooth> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | - Unsupported Older Versions | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | AMAT | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | slurm conf | ||
|
Description
Shraddha Kiran
2023-02-28 10:59:33 MST
Please attach your slurm.conf and the output of "slurmd -C" from that compute node, "dcalph001". Hello Jason, The application ran on below nodes: e162968@dcalph000:~$ ssh dcalph168 Last login: Tue Feb 28 10:56:41 2023 from master.cm.cluster sle162968@dcalph168:~$ slurmd -C NodeName=dcalph168 slurmd: Considering each NUMA node as a socket CPUs=36 Boards=1 SocketsPerBoard=4 CoresPerSocket=9 ThreadsPerCore=1 RealMemory=385335 UpTime=98-02:52:33 e162968@dcalph168:~$ e162968@dcalph168:~$ exit logout Connection to dcalph168 closed. e162968@dcalph000:~$ ssh dcalph187 Last login: Thu Oct 6 07:15:45 2022 from master.cm.cluster e162968@dcalph187:~$ slurmd -C NodeName=dcalph187 slurmd: Considering each NUMA node as a socket CPUs=36 Boards=1 SocketsPerBoard=4 CoresPerSocket=9 ThreadsPerCore=1 RealMemory=385335 UpTime=264-08:06:05 e162968@dcalph187:~$ e162968@dcalph187:~$ exit logout Connection to dcalph187 closed. e162968@dcalph000:~$ ssh dcalph188 e162968@dcalph188:~$ slurmd -C NodeName=dcalph188 slurmd: Considering each NUMA node as a socket CPUs=36 Boards=1 SocketsPerBoard=4 CoresPerSocket=9 ThreadsPerCore=1 RealMemory=385335 UpTime=127-04:57:11 Created attachment 29089 [details]
slurm conf
So, this is expected behavior when you have no memory defined in the slurm.conf. > NodeName=dcalph168 CoresPerSocket=18 Feature=6254,384G,nma,rhel7,edr The slurmd -C will give you hardware config based on what the slurmd sees. > NodeName=dcalph168 slurmd: Considering each NUMA node as a socket CPUs=36 Boards=1 SocketsPerBoard=4 CoresPerSocket=9 ThreadsPerCore=1 RealMemory=385335 If you add an entry for the RealMemory then the output will show the correct amount. Thanks for the information! Shraddha |