| Summary: | When the pmix job with large node size fails, a large number of REQUEST_ CANCEL_ JOB_ STEP messages will make the slurmctld service too busy and get stuck. Are there any optimization methods | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | QinSHB <2444817349> |
| Component: | PMIx | Assignee: | Jacob Jenson <jacob> |
| Status: | RESOLVED INVALID | QA Contact: | |
| Severity: | 6 - No support contract | ||
| Priority: | --- | ||
| Version: | 21.08.0 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | -Other- | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
QinSHB
2022-12-27 23:17:00 MST
|