Dear Slurm developers, Unfortunately, we are still facing the same behaviour as reported in #3399 with jobs going to the RUNNING state (and not waiting in CONFIGURING state) before the nodes are booted and ready to run the jobs. The patch you gently provided worked nicely in my testing environment but it doesn't bring the expected result on the production cluster... Actually, with a quick overview of the source code, I figured out that the difference between the 2 environments comes from the PrologSlurmctld parameter. As soon as a PrologSlurmctld is set in Slurm configuration, the bug comes back. I am able to reproduce it reliably by setting PrologSlurmctld=/bin/true in my testing environment. As sson as it is set, the jobs not longer wait in CONFIGURING state. I guess the bug is somewhere around prolog_running_decr() which removes the JOB_CONFIGURING bit too early w/o running the same logic as job_config_fini() to extend the end_time. Can you please check what is going on here? Thank you in advance, Rémi
Rémi - sorry for the late response. I can reproduce this by setting PrologSlurmctld, thanks for the suggestion. We're gonna come take a look at this and come back to you.
Created attachment 4041 [details] Proposed fix Alex, If you could give do some testing of this patch for me, that would be much appreciated. I have tried this using quite a few different configurations: Suspending and resuming nodes Jobs with --reboot option With and without PrologSlurmctld BlueGene/Q (emulated) Various timings I have not yet run on a KNL (rebooting the node into various NUMA and MCDRAM modes) but hope to do that next week.
(In reply to Moe Jette from comment #12) > Created attachment 4041 [details] > Proposed fix > > Alex, > If you could give do some testing of this patch for me, that would be much > appreciated. > > I have tried this using quite a few different configurations: > Suspending and resuming nodes > Jobs with --reboot option > With and without PrologSlurmctld > BlueGene/Q (emulated) > Various timings > > I have not yet run on a KNL (rebooting the node into various NUMA and MCDRAM > modes) but hope to do that next week. Moe, I've been testing different configurations too: - Suspend/Resume with/without PrologSlurmctld - Suspend/Resume with/without --reboot - Combinations of the two above - Different timings - Changing the timings while powering up and 'scontrol reconfigure', including a resumeprogram with a sleep 30 + slurmd start, and initially ResumeTimeout < 30, then readjusting it to > 30 and 'scontrol reconfigure', then job ends up transitioning from CF to R properly and node state changes correctly too. They all work as expected for me, your patch looks good so far. I've not tested: - BlueGene/Q (emulated) - KNL
Three of us have tested this patch on quite a few different systems with various configurations and found no problems. https://github.com/SchedMD/slurm/commit/f6d42fdbb293ca89da609779db8d8c04a86a8d13.patch This change will be in version 16.05.10 when released (no date set).
Hi Moe, Thank you for this patch! Is there any chance you backport it to slurm 15.08? Best, Rémi
Created attachment 4049 [details] Patch for version 15.08.13 I do not expect that we will have any more releases of Slurm version 15.08. Also note that the reboot logic in version 16.05 is very different from 15.08 (mostly due to changes required for support of Intel KNL and rebooting to change NUMA or MCDRAM configuration). The attached patch has been tested with version 15.08.13.
Also note that support for Slurm version 15.08 will end in May 2017, so upgrading soon is strongly recommended.
Hi Moe, Thank you for this new version of the patch, I'm going to give it a try soon. Indeed, we will upgrade this cluster next summer, thank you for the reminder. Best, Rémi