3847 – Nodes can easily get overallocated by exclusive jobs

Ticket 3847 - Nodes can easily get overallocated by exclusive jobs

Summary: Nodes can easily get overallocated by exclusive jobs

Status:	RESOLVED DUPLICATE of ticket 3879

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Scheduling (show other tickets)
Version:	17.02.3
Hardware:	Linux Linux

Severity:	6 - No support contract
Assignee:	Jacob Jenson
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2017-05-30 12:43 MDT by Thomas Opfer
Modified:	2017-07-20 02:30 MDT (History)
CC List:	1 user (show)

See Also:
Site:	-Other-
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Thomas Opfer 2017-05-30 12:43:10 MDT

When I run a Job in non-exclusive mode, everything is fine:

to86cola@hla0002:~$ /opt/slurm/current/bin/srun -n 1 --mem-per-cpu=25000 -t 30 -C mpi --pty bash
srun: job 3439844 queued and waiting for resources
srun: job 3439844 has been allocated resources
to86cola@hpa0001:~$ scontrol show node hpa0001|grep TRES
   CfgTRES=cpu=16,mem=28000M
   AllocTRES=cpu=1,mem=25000M
to86cola@hpa0001:~$


When I instead run the same job im exclusive mode, it gets overallocated:

to86cola@hla0002:~$ /opt/slurm/current/bin/srun -n 1 --exclusive --mem-per-cpu=25000 -t 30 -C mpi --pty bash
srun: job 3439868 queued and waiting for resources
srun: job 3439868 has been allocated resources
to86cola@hpa0001:~$ scontrol show node hpa0001|grep TRES
   CfgTRES=cpu=16,mem=28000M
   AllocTRES=cpu=16,mem=400000M
to86cola@hpa0001:~$


In my opinion, the memory to allocate per cpu should be calculated by something like (requested_mem_per_cpu_on_this_node*requested_cpus_on_this_node)/allocated_cpus_on_this_node.


Please fix this as it causes lots of messages in slurmctld.log, e.g.:

[2017-05-30T20:38:21.839] error: cons_res: node hpa0196 memory is overallocated (32000) for job 3439883
[2017-05-30T20:38:21.841] error: cons_res: node hpa0197 memory is overallocated (32000) for job 3439884
[2017-05-30T20:38:52.642] error: cons_res: node hpa0201 memory is overallocated (32000) for job 3439885
[2017-05-30T20:38:52.643] error: cons_res: node hpa0205 memory is overallocated (32000) for job 3439886
[2017-05-30T20:38:52.645] error: cons_res: node hpa0306 memory is overallocated (32000) for job 3439887
[2017-05-30T20:38:52.646] error: cons_res: node hpa0312 memory is overallocated (32000) for job 3439888

And also in slurmdbd.log, e.g.:

[2017-05-30T20:24:21.086] error: We have more allocated time than is possible (272381215680 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T06:00:00 - 2017-05-21T07:00:00 tres 2
[2017-05-30T20:24:21.086] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T06:00:00 - 2017-05-21T07:00:00 tres 2
[2017-05-30T20:24:31.373] error: We have more allocated time than is possible (270273934400 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T07:00:00 - 2017-05-21T08:00:00 tres 2
[2017-05-30T20:24:31.373] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T07:00:00 - 2017-05-21T08:00:00 tres 2
[2017-05-30T20:24:41.025] error: We have more allocated time than is possible (270435154400 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T08:00:00 - 2017-05-21T09:00:00 tres 2
[2017-05-30T20:24:41.025] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T08:00:00 - 2017-05-21T09:00:00 tres 2
[2017-05-30T20:24:50.262] error: We have more allocated time than is possible (269253026080 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T09:00:00 - 2017-05-21T10:00:00 tres 2
[2017-05-30T20:24:50.262] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T09:00:00 - 2017-05-21T10:00:00 tres 2
[2017-05-30T20:24:59.087] error: We have more allocated time than is possible (260691142920 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T10:00:00 - 2017-05-21T11:00:00 tres 2
[2017-05-30T20:24:59.087] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T10:00:00 - 2017-05-21T11:00:00 tres 2
[2017-05-30T20:25:09.605] error: We have more allocated time than is possible (255667014200 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T11:00:00 - 2017-05-21T12:00:00 tres 2
[2017-05-30T20:25:09.605] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T11:00:00 - 2017-05-21T12:00:00 tres 2

Comment 1 Thomas Opfer 2017-07-20 02:30:56 MDT

This will be resolved when bug 3879 is resolved.

*** This ticket has been marked as a duplicate of ticket 3879 ***