Ticket 6594 - Improve pending hetjobs state_reason
Summary: Improve pending hetjobs state_reason
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other tickets)
Version: 18.08.5
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Alejandro Sanchez
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-02-26 11:24 MST by Alejandro Sanchez
Modified: 2019-03-22 04:51 MDT (History)
2 users (show)

See Also:
Site: Jülich
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 18.08.7 19.05.0pre4
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Alejandro Sanchez 2019-02-26 11:24:50 MST
Coming from bug 5579 to decouple this issue in a separate bug.

1. We can document that hetjobs won't change their PD state_reason until evaluated by backfill. So it will remain (None) until evaluated. We discussed internally and we don't like the idea of adding a hetjob creation time waiting reason of WAIT_BACKFILL or something, WAIT_NO_REASON (None) till evaluated by backfill seems ok to us.

2. Once evaluated by backfill, set an initial state_reason of WAIT_RESOURCES and let backfill change it accordingly if needed. WAIT_PRIORITY doesn't make much sense in backfill since its a concept more tied to the main scheduler when a job in a partition fails to be allocated and rest of jobs in the partition then are set to WAIT_PRIORITY.
Comment 6 Alejandro Sanchez 2019-03-22 04:51:17 MDT
Hi,

this has been fixed in the following commit available since 18.08.7:

https://github.com/SchedMD/slurm/commit/695456d48ed7ff0ad