Ticket 2143

Summary: [832707] - Slurm BB code does retry DWS teardown function
Product: Slurm Reporter: Andrew Barry <abarry>
Component: Burst BuffersAssignee: Moe Jette <jette>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: dpaul, tim
Version: 15.08.3   
Hardware: Linux   
OS: Linux   
Site: CRAY Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: Other DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed: 15.08.5
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Andrew Barry 2015-11-16 06:47:02 MST
Slurm should retry the 'dw_wlm_cli --function teardown ...' call until it succeeds.  It may wish to sleep in between re-tries but if it gives up there's no guarantee that the DWS state gets cleaned up.
Comment 1 Moe Jette 2015-11-16 08:08:45 MST
Retry added with 5 second sleep between retries:
https://github.com/SchedMD/slurm/commit/789f5c7edb09ca6a315d2f926abfd4be0fd41e5e

Change will be in version 15.08.5 when released, likely mid-December