Ticket 7753

Summary:	Reservation core count changed after controller restart
Product:	Slurm	Reporter:	CSC sysadmins <csc-slurm-tickets>
Component:	Scheduling	Assignee:	Ben Roberts <ben>
Status:	RESOLVED FIXED	QA Contact:
Severity:	4 - Minor Issue
Priority:	---	CC:	alex
Version:	19.05.2
Hardware:	Linux
OS:	Linux
Site:	CSC - IT Center for Science	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:	19.05.6 20.02.pre1
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description CSC sysadmins 2019-09-16 01:48:11 MDT

Hi,

Colleague made couple of standard reservations:

[2019-09-09T14:32:23.124] sched: Created reservation=training1 users=root nodes=r14c[07,09,21,43,47] cores=200 licenses=(null) tres=cpu=200 watts=4294967294 start=2019-09-10T08:00:00 end=2019-09-10T18:00:00

[2019-09-09T14:32:30.092] sched: Created reservation=training2 users=root nodes=r16c[33,36,44-46] cores=200 licenses=(null) tres=cpu=200 watts=4294967294 start=2019-09-11T08:00:00 end=2019-09-11T18:00:00

Couple of hours later we restarted slurmctld and suddenly reservation core count changed from 200 to 26400! (partition maximum)

[2019-09-09T16:18:09.861] Recovered state of reservation dlintro
[2019-09-09T16:18:09.861] Recovered state of reservation gputest
[2019-09-09T16:18:09.861] Recovered state of reservation prolog_test
[2019-09-09T16:18:09.861] Recovered state of reservation training1
[2019-09-09T16:18:09.861] Recovered state of reservation training2
[2019-09-09T16:18:09.863] sched: Updated reservation=training1 users=root nodes=r[01-04,13-18]c[01-48],r[05-06]c[01-64],r07c[05-56] cores=26400 licenses=(null) tres=cpu=26400 watts=4294967294 start=2019-09-10T08:00:00 end=2019-09-10T18:00:00
[2019-09-09T16:18:09.863] sched: Updated reservation=training2 users=root nodes=r[01-04,13-18]c[01-48],r[05-06]c[01-64],r07c[05-56] cores=26400 licenses=(null) tres=cpu=26400 watts=4294967294 start=2019-09-11T08:00:00 end=2019-09-11T18:00:00

Config files can be found on bug: https://bugs.schedmd.com/show_bug.cgi?id=7685

Comment 1 Ben Roberts 2019-09-17 10:46:38 MDT

Hi Tommi,

I've been trying to reproduce the issue you're describing this morning, but I haven't found a way to make it happen.  Is this something you're able to reproduce?  If so, can you send me the 'scontrol create reservation' command you're using to create the reservation?  If you can't reproduce it, can you see if your colleague can find the command he used in his/her history?  

Thanks,
Ben

Comment 2 CSC sysadmins 2019-09-18 01:03:46 MDT

Grepped from the history:

scontrol create reservationname=training1 nodecnt=5 users=root,userx starttime=2019-09-10T08:00:00 duration=10:00:00 Flags=PART_NODES PartitionName=large
scontrol create reservationname=training2 nodecnt=5 users=root,userx starttime=2019-09-11T08:00:00 duration=10:00:00 Flags=PART_NODES PartitionName=large

I retested with similar flags and I was able to reproduce this bug. Colleague removed partition from the reservation and after that reservation does not change on scontrol reconfig 

scontrol update ReservationName=training1 PartitionName=

Comment 3 Ben Roberts 2019-09-18 09:23:04 MDT

Hi Tommi,

Thanks for sending those commands, I was able to reproduce the issue as well.  It looks like the problem behavior is being caused by the PART_NODES flag.  This flag isn't necessary for the type of reservation being created.  You can keep the partition specification as well as the node count to have the reservation get the right nodes.  Leaving off the PART_NODES flag shouldn't have an effect since the requirements to use the flag aren't being met.  Here's the description of the flag from the documentation:
-----------------
This flag can be used to reserve all nodes within the specified partition. PartitionName and Nodes=ALL must be specified or this option is ignored. 
-----------------

The flag is being ignored at the reservation creation time, but something is going wrong with the logic when Slurm is restarted.  To work around this bug you can leave off the flag and I'll keep looking into what's happening when Slurm is restarted.

Please let me know if you have any questions.

Thanks,
Ben

Comment 5 CSC sysadmins 2019-09-19 00:42:40 MDT

Hi,

Thanks for info, I think we can cope without PART_NODES flag :)

Comment 31 Ben Roberts 2020-01-15 12:50:17 MST

Hi Tommi,

I know  you were able to work around this by removing the PART_NODES flag from your reservation creation command.  For your reference, I wanted to let you know that there have been fixes checked in to address this behavior.  The change in 19.05.6 will be to remove the PART_NODES flag after the reservation was created if it wasn't created with all nodes.  You can see the details of the commit here:
https://github.com/SchedMD/slurm/commit/4bbce568958a0b20f46fbccb069bf9140f7f514e

In 20.02 we will ensure that users specify all nodes when using the PART_NODES flag:
https://github.com/SchedMD/slurm/commit/77ae6880d94852961820a68db8010299da95f523

I'll close this ticket now.  Let me know if you have questions about this.

Thanks,
Ben