Hi, Just upgraded from 14.08.11 to 16.05, and our "long" QOS doesn't seem to work anymore. We have a MaxTime of 2 days on our "normal" partition: # scontrol show partition normal | grep Time DefaultTime=02:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED and a 7 days MaxWall on our "long" QOS, with the PartitionTimeLimit flag set, so the QOS could override the partition limit: # sacctmgr show qos long format=name,flags%30,maxwall Name Flags MaxWall ---------- ------------------------------ ----------- long DenyOnLimit,PartitionTimeLimit 7-00:00:00 It worked perfectly fine in 14.11, users could submit jobs with a time limit greater than 2 days using the "long" QOS. Now, this fails: $ srun --qos long -p normal --time=2-0:0:1 --pty bash srun: error: Unable to allocate resources: Requested time limit is invalid (missing or exceeds some limit) Works fine within the partition limits though: $ srun --qos long -p normal --time=2-0:0:0 --pty bash srun: job 8533628 queued and waiting for resources We didn't change the rest of our configuration, and have: # scontrol show config | grep -i PartLimits EnforcePartLimits = ANY so this definitely looks like a behavior change between 14.11 and 15.06. Is this expected? Thanks!
Quick correction about version numbers, sorry for the confusion: I meant we upgraded from 15.08.11 to 16.05, and it worked fine in 15.08.11.
I can reproduce this easily, I'm looking into a fix now. As a possible temporary workaround, EnforcePartLimits=no looks like it'll do the right thing, except that you may end up with invalid jobs queued in the meantime.
(In reply to Tim Wickberg from comment #2) > I can reproduce this easily, I'm looking into a fix now. Great, thanks Tim! > As a possible temporary workaround, EnforcePartLimits=no looks like it'll do > the right thing, except that you may end up with invalid jobs queued in the > meantime. Thanks for the suggestion.
Fixed in commit 377b448a34f7b. Patch is available here if you want to apply it ahead of 16.05.1 being released: https://github.com/SchedMD/slurm/commit/377b448a34f7bbb.patch
Hi Tim, Awesome! Applied the patch, and the issue looks resolved now. Thank you!