Ticket 1924 - sacct display which nodes of a job allocation failed
Summary: sacct display which nodes of a job allocation failed
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 14.11.8
Hardware: Linux Linux
: 5 - Enhancement
Assignee: Unassigned Developer
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2015-09-08 21:17 MDT by David Bigagli
Modified: 2017-03-07 11:59 MST (History)
2 users (show)

See Also:
Site: DownUnder GeoSolutions
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description David Bigagli 2015-09-08 21:17:16 MDT

    
Comment 1 David Bigagli 2015-09-08 21:22:42 MDT
Based on #1913 if a job fails because one of the nodes in the allocation failed,
it is not immediately clear which nodes it was that failed. Moreover the
slurmstepd on the other nodes log a message in the job output which is
misleading as it mentions its own hostname which has nothing to do with the
failed node that caused the job to be terminated.

David