Ticket 15711

Summary: When the pmix job with large node size fails, a large number of REQUEST_ CANCEL_ JOB_ STEP messages will make the slurmctld service too busy and get stuck. Are there any optimization methods
Product: Slurm Reporter: QinSHB <2444817349>
Component: PMIxAssignee: Jacob Jenson <jacob>
Status: RESOLVED INVALID QA Contact:
Severity: 6 - No support contract    
Priority: ---    
Version: 21.08.0   
Hardware: Linux   
OS: Linux   
Site: -Other- Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description QinSHB 2022-12-27 23:17:00 MST