Ticket 13276 - PMI plugin modification
Summary: PMI plugin modification
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: PMIx (show other tickets)
Version: 20.11.7
Hardware: Linux Linux
: C - Contributions
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-01-27 07:14 MST by daiyq@nudt.edu.cn
Modified: 2022-01-27 08:38 MST (History)
0 users

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
patch (5.09 KB, patch)
2022-01-27 07:14 MST, daiyq@nudt.edu.cn
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description daiyq@nudt.edu.cn 2022-01-27 07:14:35 MST
Created attachment 23170 [details]
patch

Some modifications regarding PMI2 and PMIx plugins.These changes, mainly done by Hongjia Cao (the original author of the PMI2 plug-in) and Yiqin Dai, reduce the PMI2 plug-in failures when running large scale jobs and reduce the time overhead steps in the PMIx plug-in. One of the changes to the PMI2 plug-in is proposed to be added to the official release.


This is a problem found by the National University of Defense Technology when applying slurm to large scale supercomputers (over 10^6 cores), and we classify the problem into five categories, of which the above problem is one.We provide a simpler patch and look forward to communicating with and contributing code to the Slurm community.  The authors of this patch are Hongjia Cao and Yiqin Dai at daiyq@nudt.edu.cn.