I believe that this is the same problem described in bug 8224, but that bug doesn't appear to be getting a lot of attention and I confirmed that the issue exists in 20.02.0 as well, so I thought I'd bring it up again. Basically, if a user updates the number of nodes that a job is requesting, that change appears to take effect, but the change is not reflected in the output of squeue for any jobs except the highest priority job. Bug 8224 also suggested that squeue might just take longer to get an updated value from the select plugin, but if that's the case, it's taking more than 1 hour. Here's an example: [day36@haze2:~]$ srun -N3 -t60 sleep 1h & [1] 66888 [day36@haze2:~]$ srun: job 157 queued and waiting for resources [day36@haze2:~]$ srun: job 157 has been allocated resources [day36@haze2:~]$ srun -N3 sleep 10m & [2] 66904 [day36@haze2:~]$ srun: job 158 queued and waiting for resources srun -N3 sleep 10m & [3] 66907 [day36@haze2:~]$ srun: job 159 queued and waiting for resources [day36@haze2:~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 158 pdebug sleep day36 PD 0:00 3 (Resources) 159 pdebug sleep day36 PD 0:00 3 (Priority) 157 pdebug sleep day36 R 0:10 3 haze[6-8] [day36@haze2:~]$ scontrol update jobid=158 numnodes=1-1 [day36@haze2:~]$ scontrol update jobid=159 numnodes=1-1 [day36@haze2:~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 159 pdebug sleep day36 PD 0:00 3 (Priority) 158 pdebug sleep day36 PD 0:00 1 (Resources) 157 pdebug sleep day36 R 1:21 3 haze[6-8] [day36@haze2:~]$ for I in `seq 1 12`; do echo $I; squeue; sleep 5m; done 1 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 158 pdebug sleep day36 PD 0:00 1 (Resources) 159 pdebug sleep day36 PD 0:00 3 (Priority) 157 pdebug sleep day36 R 3:04 3 haze[6-8] 2 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 158 pdebug sleep day36 PD 0:00 1 (Resources) 159 pdebug sleep day36 PD 0:00 3 (Priority) 157 pdebug sleep day36 R 8:04 3 haze[6-8] ... 11 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 158 pdebug sleep day36 PD 0:00 1 (Resources) 159 pdebug sleep day36 PD 0:00 3 (Priority) 157 pdebug sleep day36 R 53:04 3 haze[6-8] 12 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 158 pdebug sleep day36 PD 0:00 1 (Resources) 159 pdebug sleep day36 PD 0:00 3 (Priority) 157 pdebug sleep day36 R 58:04 3 haze[6-8] srun: job 159 has been allocated resources srun: job 158 has been allocated resources [1] Done srun -N3 -t60 sleep 1h [day36@haze2:~]$ [day36@haze2:~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 158 pdebug sleep day36 R 4:06 1 haze6 159 pdebug sleep day36 R 4:06 1 haze7 [day36@haze2:~]$ This is with SelectType=select/cons_res. Haven't checked out cons_tres yet.
>I believe that this is the same problem described in bug 8224, but that bug doesn't appear to be getting a lot of attention and I confirmed that the issue exists in 20.02.0 as well... Hi Ryan - We apologize for the confusion in bug #8224. We have a review process that we send all our potential fixes through so as to avoid introducing new issues into the codebase. There is a patch pending for this and I will leave it up to Dominik if he wants to make that available for you to test with. I hope you understand why there seems to be a delay in that ticket while we review the patch.
Ah. That makes sense. Thank you for the update Jason. Since 19.05.6 was just released as the last 19.05 release aside from security fixes, should I also assume that the patch will only be available for 20.02? Thanks, Ryan (In reply to Jason Booth from comment #1) > >I believe that this is the same problem described in bug 8224, but that bug doesn't appear to be getting a lot of attention and I confirmed that the issue exists in 20.02.0 as well... > > Hi Ryan - We apologize for the confusion in bug #8224. We have a review > process that we send all our potential fixes through so as to avoid > introducing new issues into the codebase. There is a patch pending for this > and I will leave it up to Dominik if he wants to make that available for you > to test with. > > I hope you understand why there seems to be a delay in that ticket while we > review the patch.
> ... should I also assume that the patch will only be available for 20.02? That is correct. Bug fixes are being targeted for 20.02 now that it has been released. Security fixes or crashes will be targeted for 19.05 now.
Hi The fix is committed to the repo and it will be available in 20.02.2 release. https://github.com/SchedMD/slurm/commit/623574431d545b2ff0 This patch goes only to the 20.02 branch, but you can apply it to 19.05 safely. I'm going to go ahead and close this bug as a duplicate of 8224. If you have any questions, feel free to reopen. Dominik *** This ticket has been marked as a duplicate of ticket 8224 ***