Summary: | Split reason and other partition-specific values into separate array/List in job_record_t | ||
---|---|---|---|
Product: | Slurm | Reporter: | Marshall Garey <marshall> |
Component: | slurmctld | Assignee: | Unassigned Developer <dev-unassigned> |
Status: | OPEN --- | QA Contact: | |
Severity: | 5 - Enhancement | ||
Priority: | --- | CC: | nate, pedmon |
Version: | 20.02.2 | ||
Hardware: | Linux | ||
OS: | Linux | ||
See Also: |
https://bugs.schedmd.com/show_bug.cgi?id=9024 https://bugs.schedmd.com/show_bug.cgi?id=7248 |
||
Site: | SchedMD | Slinky Site: | --- |
Alineos Sites: | --- | Atos/Eviden Sites: | --- |
Confidential Site: | --- | Coreweave sites: | --- |
Cray Sites: | --- | DS9 clusters: | --- |
Google sites: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Tzag Elita Sites: | --- |
Linux Distro: | --- | Machine Name: | |
CLE Version: | Version Fixed: | ||
Target Release: | --- | DevPrio: | --- |
Emory-Cloud Sites: | --- |
Description
Marshall Garey
2020-05-19 17:02:44 MDT
Paul, I'm adding you to CC to this bug as I mentioned in bug 9024. If you don't want to follow it feel free to remove yourself from CC. This is the bug where we're tracking the job's reason changing between MaxMemPerLimit and Resources for a multi partition job submission where the job can't run in one of the partitions because the memory per node request is larger than any node in that partition. See comment 0 for a description / reproducer. Updating to reflect the preferred approach to resolving this, and similar, types of issues around multi-partition job submissions. At this point we do not have a plan to tackle this - and unfortunately do not in the 20.11 timeframe. |