Ticket 3847

Summary: Nodes can easily get overallocated by exclusive jobs
Product: Slurm Reporter: Thomas Opfer <hrz>
Component: SchedulingAssignee: Jacob Jenson <jacob>
Status: RESOLVED DUPLICATE QA Contact:
Severity: 6 - No support contract    
Priority: --- CC: hrz
Version: 17.02.3   
Hardware: Linux   
OS: Linux   
Site: -Other- Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Thomas Opfer 2017-05-30 12:43:10 MDT
When I run a Job in non-exclusive mode, everything is fine:

to86cola@hla0002:~$ /opt/slurm/current/bin/srun -n 1 --mem-per-cpu=25000 -t 30 -C mpi --pty bash
srun: job 3439844 queued and waiting for resources
srun: job 3439844 has been allocated resources
to86cola@hpa0001:~$ scontrol show node hpa0001|grep TRES
   CfgTRES=cpu=16,mem=28000M
   AllocTRES=cpu=1,mem=25000M
to86cola@hpa0001:~$


When I instead run the same job im exclusive mode, it gets overallocated:

to86cola@hla0002:~$ /opt/slurm/current/bin/srun -n 1 --exclusive --mem-per-cpu=25000 -t 30 -C mpi --pty bash
srun: job 3439868 queued and waiting for resources
srun: job 3439868 has been allocated resources
to86cola@hpa0001:~$ scontrol show node hpa0001|grep TRES
   CfgTRES=cpu=16,mem=28000M
   AllocTRES=cpu=16,mem=400000M
to86cola@hpa0001:~$


In my opinion, the memory to allocate per cpu should be calculated by something like (requested_mem_per_cpu_on_this_node*requested_cpus_on_this_node)/allocated_cpus_on_this_node.


Please fix this as it causes lots of messages in slurmctld.log, e.g.:

[2017-05-30T20:38:21.839] error: cons_res: node hpa0196 memory is overallocated (32000) for job 3439883
[2017-05-30T20:38:21.841] error: cons_res: node hpa0197 memory is overallocated (32000) for job 3439884
[2017-05-30T20:38:52.642] error: cons_res: node hpa0201 memory is overallocated (32000) for job 3439885
[2017-05-30T20:38:52.643] error: cons_res: node hpa0205 memory is overallocated (32000) for job 3439886
[2017-05-30T20:38:52.645] error: cons_res: node hpa0306 memory is overallocated (32000) for job 3439887
[2017-05-30T20:38:52.646] error: cons_res: node hpa0312 memory is overallocated (32000) for job 3439888

And also in slurmdbd.log, e.g.:

[2017-05-30T20:24:21.086] error: We have more allocated time than is possible (272381215680 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T06:00:00 - 2017-05-21T07:00:00 tres 2
[2017-05-30T20:24:21.086] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T06:00:00 - 2017-05-21T07:00:00 tres 2
[2017-05-30T20:24:31.373] error: We have more allocated time than is possible (270273934400 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T07:00:00 - 2017-05-21T08:00:00 tres 2
[2017-05-30T20:24:31.373] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T07:00:00 - 2017-05-21T08:00:00 tres 2
[2017-05-30T20:24:41.025] error: We have more allocated time than is possible (270435154400 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T08:00:00 - 2017-05-21T09:00:00 tres 2
[2017-05-30T20:24:41.025] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T08:00:00 - 2017-05-21T09:00:00 tres 2
[2017-05-30T20:24:50.262] error: We have more allocated time than is possible (269253026080 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T09:00:00 - 2017-05-21T10:00:00 tres 2
[2017-05-30T20:24:50.262] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T09:00:00 - 2017-05-21T10:00:00 tres 2
[2017-05-30T20:24:59.087] error: We have more allocated time than is possible (260691142920 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T10:00:00 - 2017-05-21T11:00:00 tres 2
[2017-05-30T20:24:59.087] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T10:00:00 - 2017-05-21T11:00:00 tres 2
[2017-05-30T20:25:09.605] error: We have more allocated time than is possible (255667014200 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T11:00:00 - 2017-05-21T12:00:00 tres 2
[2017-05-30T20:25:09.605] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T11:00:00 - 2017-05-21T12:00:00 tres 2
Comment 1 Thomas Opfer 2017-07-20 02:30:56 MDT
This will be resolved when bug 3879 is resolved.

*** This ticket has been marked as a duplicate of ticket 3879 ***