Ticket 2078

Summary: reservation against non-available nodes leaves res in strange state - using Nodes= + CoreCnt=
Product: Slurm Reporter: Deric Sullivan <deric.sullivan>
Component: slurmctldAssignee: Jacob Jenson <jacob>
Status: RESOLVED FIXED QA Contact:
Severity: 6 - No support contract    
Priority: --- CC: alex, brian, da, tim
Version: 16.05.x   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 15.08.3 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Deric Sullivan 2015-10-28 09:28:58 MDT
Hello,
There seems to be a bug with reservations using a node list (e.g. Nodes=something + CoreCnt=something).  The result is a reservation made that's arguably broken; listing the reservation (scontrol show reservation) will show "Nodes=" (blank) and "CoreCnt=0".

It's very easy to reproduce, just by doing the following against a node in a DOWN (also tested with POWER_UP) state:
scontrol create ReservationName=tmp_res StartTime=now EndTime=now+600 Nodes=<some_non_idle_node> CoreCnt=1 Users=<some_valid_user>
scontrol show reservation


Arguably this could be considered to work as designed, but assuming it's a bug, I'm sure there are a number of ways to fix this issue.  One would be to do a number of tests and disallow the user from using a node list with nodes that are not "available".  Another way, which I tested, is to assume the user knows what they want if they specify a node list and let the reservation go through even if nodes are not available.  If it's of any use I've provided a diff patch below.


$ diff -Naur ./src/slurmctld/reservation.c ./src/slurmctld/reservation.c.new
--- ./src/slurmctld/reservation.c       2015-10-20 13:50:30.728109177 +0000
+++ ./src/slurmctld/reservation.c.new   2015-10-28 20:02:09.881756000 +0000
@@ -3851,7 +3851,8 @@
                FREE_NULL_BITMAP(feature_bitmap);
        }
 
-       if ((resv_desc_ptr->flags & RESERVE_FLAG_MAINT) == 0) {
+       if (((resv_desc_ptr->flags & RESERVE_FLAG_MAINT) == 0) &&
+            ((resv_desc_ptr->flags & RESERVE_FLAG_SPEC_NODES) == 0)) {
                /* Nodes must be available */
                bit_and(node_bitmap, avail_node_bitmap);
        }


Thanks,
Deric
Comment 1 Moe Jette 2015-10-29 11:17:49 MDT
Perfect. Thanks for the analysis and patch. The commit is here:
https://github.com/SchedMD/slurm/commit/6aed461bde86c8cabc3417a53502f3b17d8a86c5