| Summary: | reservation against non-available nodes leaves res in strange state - using Nodes= + CoreCnt= | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Deric Sullivan <deric.sullivan> |
| Component: | slurmctld | Assignee: | Jacob Jenson <jacob> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 6 - No support contract | ||
| Priority: | --- | CC: | alex, brian, da, tim |
| Version: | 16.05.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | -Other- | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 15.08.3 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
Perfect. Thanks for the analysis and patch. The commit is here: https://github.com/SchedMD/slurm/commit/6aed461bde86c8cabc3417a53502f3b17d8a86c5 |
Hello, There seems to be a bug with reservations using a node list (e.g. Nodes=something + CoreCnt=something). The result is a reservation made that's arguably broken; listing the reservation (scontrol show reservation) will show "Nodes=" (blank) and "CoreCnt=0". It's very easy to reproduce, just by doing the following against a node in a DOWN (also tested with POWER_UP) state: scontrol create ReservationName=tmp_res StartTime=now EndTime=now+600 Nodes=<some_non_idle_node> CoreCnt=1 Users=<some_valid_user> scontrol show reservation Arguably this could be considered to work as designed, but assuming it's a bug, I'm sure there are a number of ways to fix this issue. One would be to do a number of tests and disallow the user from using a node list with nodes that are not "available". Another way, which I tested, is to assume the user knows what they want if they specify a node list and let the reservation go through even if nodes are not available. If it's of any use I've provided a diff patch below. $ diff -Naur ./src/slurmctld/reservation.c ./src/slurmctld/reservation.c.new --- ./src/slurmctld/reservation.c 2015-10-20 13:50:30.728109177 +0000 +++ ./src/slurmctld/reservation.c.new 2015-10-28 20:02:09.881756000 +0000 @@ -3851,7 +3851,8 @@ FREE_NULL_BITMAP(feature_bitmap); } - if ((resv_desc_ptr->flags & RESERVE_FLAG_MAINT) == 0) { + if (((resv_desc_ptr->flags & RESERVE_FLAG_MAINT) == 0) && + ((resv_desc_ptr->flags & RESERVE_FLAG_SPEC_NODES) == 0)) { /* Nodes must be available */ bit_and(node_bitmap, avail_node_bitmap); } Thanks, Deric