Hi, Splitting this off #10827: -- 8< -------------------------------------- I also noticed an odd behavior when trying to change logging level with "scontrol setdebug": it makes the daemon stop logging to systemd. I typically use "journalctl -u slurmctld -f" to follow the controller logs, and when the logging level is changed with scontrol, logging stops and nothing is logged anymore through journalctl until the daemon is restarted. The logging level used doesn't seem to make any difference, and it doesn't happen when enabling or disabling debugflags. It also looks like a relatively recent behavior as I don't recall seeing it in previous versions. -- 8< -------------------------------------- I'm not sure how to best illustrate this, but here's the output slurmctld sent to syslog (recorded with journalctl): -- 8< -------------------------------------- Feb 22 10:23:18 sh03-sl01.int slurmctld[11472]: _job_complete: JobId=18852775_139(18853484) done Feb 22 10:23:18 sh03-sl01.int slurmctld[11472]: _job_complete: JobId=18852775_118(18853463) WEXITSTATUS 0 Feb 22 10:23:18 sh03-sl01.int slurmctld[11472]: _job_complete: JobId=18852775_118(18853463) done Feb 22 10:23:18 sh03-sl01.int slurmctld[11472]: _job_complete: JobId=18852775_122(18853467) WEXITSTATUS 0 Feb 22 10:23:18 sh03-sl01.int slurmctld[11472]: _job_complete: JobId=18852775_122(18853467) done Feb 22 10:23:18 sh03-sl01.int slurmctld[11472]: prolog_running_decr: Configuration for JobId=18848792_23(18853995) is complete Feb 22 10:23:18 sh03-sl01.int slurmctld[11472]: _slurm_rpc_submit_batch_job: JobId=18854018 InitPrio=117471 usec=18755 Feb 22 10:23:18 sh03-sl01.int slurmctld[11472]: _job_complete: JobId=18852775_108(18853453) WEXITSTATUS 0 Feb 22 10:23:18 sh03-sl01.int slurmctld[11472]: _job_complete: JobId=18852775_108(18853453) done Feb 22 10:27:13 sh03-sl01.int slurmctld[11472]: topology/tree: _validate_switches: TOPOLOGY: warning -- no switch can reach all nodes through its descendants. If this is not intentional, fix the topology.conf file. Feb 22 10:27:13 sh03-sl01.int slurmctld[11472]: restoring original state of nodes Feb 22 10:27:13 sh03-sl01.int slurmctld[11472]: select/cons_tres: part_data_create_array: select/cons_tres: preparing for 141 partitions Feb 22 10:27:18 sh03-sl01.int slurmctld[11472]: SchedulerParameters=default_queue_depth=5000,max_rpc_cnt=128,max_sched_time=5,partition_job_depth=100,sched_max_job_start=0,sched_min_interval=5000000 -- 8< -------------------------------------- At 10:24, I ran "scontrol setdebug verbose", and the change is not recorded in the log. At 10:25, I ran "scontrol setdebug info" (which is the default loglevel configured in slurm.conf), and that change is not recorded either. Nothing is logged at all between the first "scontrol setdebug" and the "scontrol reconfigure" I ran at 10:27, which restored logging. Job activity remained usual during those 3 minutes, but nothing was recorded. I verified that syslog was still working by using `logger`, which successfully recorded messages in /var/log/messages. Please let me know if I can provide any additional information. Thanks! -- Kilian
Hi Kilian, I was able to reproduce the issue, have tracked down where the issue is, and now working on a patch to address it. Thanks, Skyler
Hi Skyler, (In reply to Skyler Malinowski from comment #1) > I was able to reproduce the issue, have tracked down where the issue is, and > now working on a patch to address it. Excellent to hear, thanks so much for the update! Cheers, -- Kilian
Hi Kilian, Just wanted to give you an update. The patch has been created but currently is waiting for review. Internally, things had been busy around the release of 20.11.5. Now that it has shipped, I will be trying again to get this fix merged in for the next release window. Thanks for being patient. Stay awesome! :) Regards, Skyler
Hi Skyler, On Mon, Apr 5, 2021 at 12:19 PM <bugs@schedmd.com> wrote: > Just wanted to give you an update. The patch has been created but currently is > waiting for review. Internally, things had been busy around the release of > 20.11.5. Now that it has shipped, I will be trying again to get this fix merged > in for the next release window. That's great, thanks a lot for the update! Cheers, -- Kilian
Hi Kilian, Quick update: still awaiting patch review. Thank you for being patient. -- Skyler
On Mon, May 17, 2021 at 8:11 AM <bugs@schedmd.com> wrote: > Quick update: still awaiting patch review. Thanks for the update! Cheers, -- Kilian
Monthly update: still awaiting patch review. There is a backlog of patch reviews including this one. Thanks for being patient. As always, stay awesome! :)
On Mon, Jun 14, 2021 at 1:48 PM <bugs@schedmd.com> wrote: > Monthly update: still awaiting patch review. There is a backlog of patch > reviews including this one. > > Thanks for being patient. As always, stay awesome! :) Thanks Skyler! Hope it gets to the review board soon. :) Cheers, -- Kilian
Hi Skyler, I'm wondering if you had any update on the status of that patch? If it didn't get reviewed yet, would it still possible to post it here? Thanks! -- Kilian
Hi Killian, Thanks for having patience yet again. Good news is the patch has been reviewed and merged. Thanks for poking the beast (so to speak), it got things moving. Below is the merge commit: https://github.com/SchedMD/slurm/commit/26302293ac Although this merge was for 21.08, the changes should also work with 20.11. Feel free to try it out or at least look forward to it in 21.08. Best, Skyler P.S. Stay awesome! :)
Hi Skyler, > Thanks for having patience yet again. Good news is the patch has been > reviewed and merged. Thanks for poking the beast (so to speak), it got > things moving. > > Below is the merge commit: > https://github.com/SchedMD/slurm/commit/26302293ac > > Although this merge was for 21.08, the changes should also work with 20.11. > Feel free to try it out or at least look forward to it in 21.08. Excellent, thanks a lot for the update! I'll give it a try on 20.11 as well and let you know if I have trouble. Cheers, -- Kilian