Ticket 12435 - A possible deadlock bug in the function PMI_KVS_Commit
Summary: A possible deadlock bug in the function PMI_KVS_Commit
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmdbd (show other tickets)
Version: 21.08.0
Hardware: All All
: C - Contributions
Assignee: Danny Auble
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2021-09-06 02:53 MDT by Ryan
Modified: 2021-11-03 20:45 MDT (History)
0 users

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 21.08.4 22.05.0pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
The patch (1.12 KB, patch)
2021-09-14 05:08 MDT, Ryan
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Ryan 2021-09-06 02:53:27 MDT
Hi, developers, thank you for your checking. The lock kvs_mutex maybe not released correctly if !kvs_set.kvs_comm_ptr[kvs_set.kvs_comm_recs]==1 (line 1290). The relevant code is listed below. It could lead to a deadlock if the method PMI_KVS_Commit is called multiple times or reacquire the same lock in another function.


https://github.com/SchedMD/slurm/blob/4801c60da4784346c4bf830a0b198364012e44ee/contribs/pmi/pmi.c#L1265-L1292
Comment 1 Ryan 2021-09-14 05:08:54 MDT
Created attachment 21262 [details]
The patch
Comment 2 Ryan 2021-11-03 02:27:24 MDT
Hi, Tim, Could you please take a look at my patch fixing this issue?
Comment 4 Danny Auble 2021-11-03 14:28:15 MDT
Comment on attachment 21262 [details]
The patch

Thanks Ryan, this is now in 21.08.4+ commit 1d7e69bf72.

Thanks for finding and fixing this.  It was easy to see the mistake.  Even though this fixes the issue I would strongly suggest PMI v2 :).
Comment 5 Danny Auble 2021-11-03 14:28:55 MDT
Please reopen if anything else is needed on this.

Thanks again for the patch and sorry it took so long to get to.
Comment 6 Ryan 2021-11-03 20:45:50 MDT
Thanks!