Ticket 1924

Summary: sacct display which nodes of a job allocation failed
Product: Slurm Reporter: David Bigagli <david>
Component: AccountingAssignee: Unassigned Developer <dev-unassigned>
Status: OPEN --- QA Contact:
Severity: 5 - Enhancement    
Priority: --- CC: akmalm, phils
Version: 14.11.8   
Hardware: Linux   
OS: Linux   
Site: DownUnder GeoSolutions Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description David Bigagli 2015-09-08 21:17:16 MDT

    
Comment 1 David Bigagli 2015-09-08 21:22:42 MDT
Based on #1913 if a job fails because one of the nodes in the allocation failed,
it is not immediately clear which nodes it was that failed. Moreover the
slurmstepd on the other nodes log a message in the job output which is
misleading as it mentions its own hostname which has nothing to do with the
failed node that caused the job to be terminated.

David