Ticket 9187

Summary: Correct way in Epilogue to determine if this is the last job...
Product: Slurm Reporter: Brad Viviano <viviano.brad>
Component: ConfigurationAssignee: Tim McMullan <mcmullan>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 19.05.5   
Hardware: Linux   
OS: Linux   
Site: EPA Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: tar file containing our slurmd epilogue scripts.

Description Brad Viviano 2020-06-08 05:56:23 MDT
Hello,
   Here is my scenario.  Each of the nodes of our cluster is 32 cores.  We don't allow users to share nodes, but do allow a user to run multiple jobs on the same node (i.e. multiple single core jobs).  We have an epilogue script that, when the last job on a node completes, it "cleans" the node (purges /tmp, /var/tmp, etc) to make it ready for a new user.

   I will attach the slurmd epilogue script, but basically, what I was doing in my script was:

HOSTNAME=`/bin/hostname`
JOB_COUNT=`/usr/local/bin/squeue --noheader -w ${HOSTNAME} | /usr/bin/wc -l`
if [ ${JOB_COUNT} -eq 1 ]; then
... #Do whatever cleanup is needed
fi

The issue I ran into, when multiple single core jobs completed at or around the same time, the output of the above squeue command would show jobs in "CG" and "R" state.  Then as each finished, I could/would have X jobs in "CG" state, then 0 jobs at all.  This would cause the epilogue script to fail to run.

I switched the logic to be:

HOSTNAME=`/bin/hostname`
JOB_COUNT=`/usr/local/bin/squeue --noheader --states=R,CG -w ${HOSTNAME} | /usr/bin/wc -l`
if [ ${JOB_COUNT} -eq 0 ]; then
...
fi

The above seems to have fixed the problem, except, it can cause the epilogue script to run the cleanup multiple times, but I am wondering if there is a better way.

My question is.  How do I determine in the Epilogue script, if ${SLURM_JOB_ID} is the last active job being run on that node by ${SLURM_JOB_USER} so I can run the cleanup process reliably.

Thanks.
Comment 1 Brad Viviano 2020-06-08 05:56:56 MDT
Created attachment 14568 [details]
tar file containing our slurmd epilogue scripts.
Comment 2 Brad Viviano 2020-06-08 05:59:03 MDT
I've attached a tar of all our epilogue scripts.  The one in question I am asking about is my "90-kill_on_exit" epilogue script.

Basically, I would like the cleanup part of that script to run ONLY when it's being run by the last active job on the node.

Thanks.
Comment 3 Brad Viviano 2020-06-08 07:25:58 MDT
Sorry,
   One clarification, I meant to say that "CF" for Configuring in my check below.  See updated code below:


HOSTNAME=`/bin/hostname`
JOB_COUNT=`/usr/local/bin/squeue --noheader --states=R,CF -w ${HOSTNAME} | /usr/bin/wc -l`
if [ ${JOB_COUNT} -eq 0 ]; then
...
fi



Again, the above seems to work correctly, but if multiple jobs all enter "COMPLETING" at the same time for the same node, the clean up process runs multiple times.
Comment 5 Tim McMullan 2020-06-09 14:16:52 MDT
Hi!

I did some checking into this and unfortunately I'm not sure there is currently a "good" way to do this.  The epilog is only really aware of itself, and querying the slurmctld on every job isn't ideal since its potentially a lot of load (depending on your job throughput).  It might be possible to use the slurmrestd in 20.02 to make this a little better, but it will need to fetch that data from the slurmctld as well.

That said, my first thought on improving the script was to fetch "R,CF,CG" jobs in one go, then with judicious use of awk figure out if there are running jobs, and if not pick the last job in the "CG" state to do the cleanup.

I was playing with something like this, though I wouldn't use it without a lot more testing:
HOSTNAME=`/bin/hostname`
IFS=""
JOBS=`/usr/local/bin/squeue --noheader --states=R,CF,CG -w ${HOSTNAME}`
RUNNING_JOBS=`echo ${JOBS} | awk '{if ($5 == "R" || $5 == "CF") { i++ }}; END {print i}'`
LAST_JOB=`echo ${JOBS} | awk 'END {print $1}'
unset IFS
if [[ ${RUNNING_JOBS} -eq 0 ]] && [[ "${LAST_JOB}" == "${SLURM_JOB_ID}"; then
...
fi

There might be some issues with array jobs with that concept though.  It might be possible to make it work the way you have it using flock as well, but I could imagine that still having races and the cleanup getting run more than once still.

I hope this helps!
Thanks,
--Tim
Comment 6 Tim McMullan 2020-06-17 06:16:57 MDT
Hi!

I just wanted to check and make sure this answered your question!

Thanks!
--Tim
Comment 7 Brad Viviano 2020-06-17 06:23:14 MDT
Yes, thanks.  You can close the case.
Comment 8 Tim McMullan 2020-06-17 06:32:12 MDT
Thanks Brad!  Closing now.