There are two orphaned dw_wlm_cli processes lingering on the server. Is any debug info desired before killing them off? Any way to determine why they still exist? root@ctl1==> ps -elf | grep slurm 5 S root 10614 1 9 80 0 - 414137 hrtime Nov16 ? 03:09:55 /opt/slurm/default/sbin/slurmctld 0 S root 13237 1 0 80 0 - 25701 poll_s Nov09 ? 00:00:21 /usr/bin/python /opt/cray/dw_wlm/default/bin/dw_wlm_cli --function data_in --token 19182 --job /global/syscom/cori/sc/nsg/var/cori-slurm-state/hash.2/job.19182/script 0 S root 24730 1 0 80 0 - 25702 poll_s Nov09 ? 00:00:20 /usr/bin/python /opt/cray/dw_wlm/default/bin/dw_wlm_cli --function data_in --token 20669 --job /global/syscom/cori/sc/nsg/var/cori-slurm-state/hash.9/job.20669/script
Please attached slurmctld log file related to jobs 19182 and 20669 plus any slurmctld daemon restart information. Thanks!
This commit should fix the problem. https://github.com/SchedMD/slurm/commit/7b270dc6d28284c67bd955fe8d2fa7375ae8050b Until it is applied, there may be vestigial dw_wlm_cli processes left when the slurmctld daemon is shutdown.