Ticket 9980

Summary: Cannot submit jobs to reservation if there is maint reservation
Product: Slurm Reporter: CSC sysadmins <csc-slurm-tickets>
Component: reservationsAssignee: Dominik Bartkiewicz <bart>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: alex
Version: 20.02.5   
Hardware: Linux   
OS: Linux   
Site: CSC - IT Center for Science Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 20.02.6 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description CSC sysadmins 2020-10-13 05:47:17 MDT
Hi,

I was updating our compute node image and had a short maintenance window for node reboot. But users cannot submit jobs to the normal reservation and error message is quite hard to understand (at least user point of view). I think job should go to queue and wait if it cannot finish before the maint-reservation starts.


# scontrol create reservation=test starttime=now duration=14-00:00 users=tervotom nodes=c1170
# scontrol create reservation=test_maint StartTime=2020-10-13T15:00:00 duration=1:00:00 users=tervotom nodes=c1170 flags=maint


$ sbatch -A project_2001659 --reservation=test --nodes=1 -p medium -t 1:00:00 gpcnet_opmi_load.sh 
sbatch: error: Batch job submission failed: Requested node configuration is not available

BR,
Tommi
Comment 5 Dominik Bartkiewicz 2020-10-22 09:47:28 MDT
Hi

We've fixed this in commit
https://github.com/SchedMD/slurm/commit/67116e73 which will be in 20.02.6.

I'm closing this as resolved/fixed. Let us know if you have any more issues.

Dominik