Hi, test case: srun -N 1 -n1 -c24 --mem-per-cpu=4000 Job submitted to partition which has following kind of nodes is not able to run. CPUS=80 Sockets=2 CoresPerSocket=20 ThreadsPerCore=2 RealMemory=190000 JobState=PENDING Reason=Resources Dependency=(null) NumNodes=1-1 NumCPUs=24 NumTasks=1 CPUs/Task=24 ReqB:S:C:T=0:0:*:1 TRES=cpu=24,mem=96000M,node=1,billing=24 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=24 MinMemoryCPU=4000M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 Actually requested amount of memory is 2x 96000M but it's really hard to found out. I set slurmctld to debug3 and it could not tell reason why job was pending. backfill: Failed to start JobId=xxx avail: Requested nodes are busy
Created attachment 16754 [details] current config
Also user point of view situation is awkward: srun -p fmitest -N 1 -n1 -c1 --mem-per-cpu=200000 --pty $SHELL srun: error: Memory specification can not be satisfied srun: error: Unable to allocate resources: Requested node configuration is not available which is understandable but this is not: srun p fmitest -N 1 -n1 -c1 --mem-per-cpu=100000 --pty $SHELL srun: job 4088872 queued and waiting for resources 200G job is not possible to run on that partition.
Hi Tommi, This is a duplicate of bug 9724. The title isn't exactly the same, but the fix there fixes the issue of --mem-per-cpu and hyperthreads. Let me know if you have any more questions. For now, I'm closing this as a dup of 9724. *** This ticket has been marked as a duplicate of ticket 9724 ***