Ticket 2166

Summary: Orphaned dw_wlm_cli processes on slurmctld server
Product: Slurm Reporter: David Paul <dpaul>
Component: Burst BuffersAssignee: Moe Jette <jette>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: dmjacobsen, dpaul, tim
Version: 15.08.3   
Hardware: Linux   
OS: Linux   
Site: NERSC Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name: Cori
CLE Version: Version Fixed: 15.08.5
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description David Paul 2015-11-17 10:49:44 MST
There are two orphaned dw_wlm_cli processes lingering on the server.  Is any debug info desired before killing them off?  Any way to determine why they still exist?

root@ctl1==> ps -elf | grep slurm
5 S root     10614     1  9  80   0 - 414137 hrtime Nov16 ?       03:09:55 /opt/slurm/default/sbin/slurmctld

0 S root     13237     1  0  80   0 - 25701 poll_s Nov09 ?        00:00:21 /usr/bin/python /opt/cray/dw_wlm/default/bin/dw_wlm_cli --function data_in --token 19182 --job /global/syscom/cori/sc/nsg/var/cori-slurm-state/hash.2/job.19182/script

0 S root     24730     1  0  80   0 - 25702 poll_s Nov09 ?        00:00:20 /usr/bin/python /opt/cray/dw_wlm/default/bin/dw_wlm_cli --function data_in --token 20669 --job /global/syscom/cori/sc/nsg/var/cori-slurm-state/hash.9/job.20669/script
Comment 1 Moe Jette 2015-11-17 10:51:50 MST
Please attached slurmctld log file related to jobs 19182 and 20669 plus any slurmctld daemon restart information. Thanks!
Comment 2 Moe Jette 2015-11-18 04:35:05 MST
This commit should fix the problem.
https://github.com/SchedMD/slurm/commit/7b270dc6d28284c67bd955fe8d2fa7375ae8050b

Until it is applied, there may be vestigial dw_wlm_cli processes left when the slurmctld daemon is shutdown.