Good morning, We would like to report some new and what we think is unusual behavior in our slurm cluster. We have been running slurm for some time now and we recently rolled over from MaxJobId=67043328 to the default FirstJobId=1. Now, whenever we restart slurm for a configuration change or maintenance, slurm always starts out re-using somewhat random older jobid's for newly submitted jobs. For example, since we have already exceeded MaxJobId and rolled over to 1, slurm has already incremented jobid's up to 9xx,xxx. However, if we restart slurm, it will start out using some new jobid's in the 6x,xxx,xxx range and some here and some there just kind of "filling in holes" randomly. It doesn't seem to want to pick up where it left off and just increment from there. This has happened twice now and is new behavior that, if possible we would like to suppress. We would like job numbers to remain continuous and incremental and only re-use jobid's when we reach the MaxJobId. Have you heard of this strange job numbering behavior before? Or do there is something in our configuration that is causing this? Do you know of any way we can keep this from happening? Thanks for any help you can provide!
Hi I suspect that this is 4538 duplicate. In bug 4538 comment 10 Kolbeinn described some workaround. This should be solved in 17.11.1 and above. Dominik
Thanks, Dominik. Bug 4538 sure sounds like what we stumbled upon. We'll likely apply the patch or just upgrade to v17.11.5 as the workaround would be too much trouble. Thanks again for your help!