Hi, developers, thank you for your checking. The lock kvs_mutex maybe not released correctly if !kvs_set.kvs_comm_ptr[kvs_set.kvs_comm_recs]==1 (line 1290). The relevant code is listed below. It could lead to a deadlock if the method PMI_KVS_Commit is called multiple times or reacquire the same lock in another function. https://github.com/SchedMD/slurm/blob/4801c60da4784346c4bf830a0b198364012e44ee/contribs/pmi/pmi.c#L1265-L1292
Created attachment 21262 [details] The patch
Hi, Tim, Could you please take a look at my patch fixing this issue?
Comment on attachment 21262 [details] The patch Thanks Ryan, this is now in 21.08.4+ commit 1d7e69bf72. Thanks for finding and fixing this. It was easy to see the mistake. Even though this fixes the issue I would strongly suggest PMI v2 :).
Please reopen if anything else is needed on this. Thanks again for the patch and sorry it took so long to get to.
Thanks!