Ticket 2143 - [832707] - Slurm BB code does retry DWS teardown function
Summary: [832707] - Slurm BB code does retry DWS teardown function
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Burst Buffers (show other tickets)
Version: 15.08.3
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Moe Jette
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2015-11-16 06:47 MST by Andrew Barry
Modified: 2015-12-02 05:09 MST (History)
2 users (show)

See Also:
Site: CRAY
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: Other
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 15.08.5
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Andrew Barry 2015-11-16 06:47:02 MST
Slurm should retry the 'dw_wlm_cli --function teardown ...' call until it succeeds.  It may wish to sleep in between re-tries but if it gives up there's no guarantee that the DWS state gets cleaned up.
Comment 1 Moe Jette 2015-11-16 08:08:45 MST
Retry added with 5 second sleep between retries:
https://github.com/SchedMD/slurm/commit/789f5c7edb09ca6a315d2f926abfd4be0fd41e5e

Change will be in version 15.08.5 when released, likely mid-December