Ticket 15711 - When the pmix job with large node size fails, a large number of REQUEST_ CANCEL_ JOB_ STEP messages will make the slurmctld service too busy and get stuck. Are there any optimization methods
Summary: When the pmix job with large node size fails, a large number of REQUEST_ CANC...
Status: RESOLVED INVALID
Alias: None
Product: Slurm
Classification: Unclassified
Component: PMIx (show other tickets)
Version: 21.08.0
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-12-27 23:17 MST by QinSHB
Modified: 2022-12-27 23:17 MST (History)
0 users

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description QinSHB 2022-12-27 23:17:00 MST