Ticket 571 - Ppilog randomly kill array jobs
Summary: Ppilog randomly kill array jobs
Status: RESOLVED INVALID
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 2.6.x
Hardware: Linux Linux
: 2 - High Impact
Assignee: David Bigagli
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2014-01-23 04:07 MST by Rod Schultz
Modified: 2014-01-24 04:18 MST (History)
3 users (show)

See Also:
Site: Coventry University (UK)
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
Patch to handle array style jobid in epilog clean (381 bytes, application/octet-stream)
2014-01-23 04:07 MST, Rod Schultz
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Rod Schultz 2014-01-23 04:07:02 MST
Created attachment 598 [details]
Patch to handle array style jobid in epilog clean

The epilog clean script doesn't handle the new job_id format for array job.
So the epilog script is running whilst some of the job in the job array are still going.
Comment 1 David Bigagli 2014-01-23 04:18:11 MST
Hi Rod,
       thanks for the diffs, but could you please append the steps to reproduce the problem so we can see it. 

David
Comment 2 David Bigagli 2014-01-23 07:32:14 MST
Hi, 
   I cannot reproduce the problem. Do they have a modified version of the epilog 
respect to one that is in the example?

If you invoke squeue as it is in the script:

squeue --format=%A

the command returns the job array ids without the underscore, these are
the values of SLURM_JOB_ID env variable. This is documented in the squeue
man page.

If you invoke squeue without the format then you will get the job array ids
with the underscore.

The example script appears to be correct.

David
Comment 3 Rod Schultz 2014-01-24 04:15:37 MST
David,

Thanks for looking at this.

You are right, the script is correct.

I've asked the submitter for his script and a better description of the symptoms.

Rod.
Comment 4 David Bigagli 2014-01-24 04:18:04 MST
Closing. False alarm.

David