Ticket 4368

Summary: Wrong #SBATCH directives in jobscript lead to unpredictable behavior.
Product: Slurm Reporter: Chrysovalantis Paschoulas <c.paschoulas>
Component: Heterogeneous JobsAssignee: Felip Moll <felip.moll>
Status: RESOLVED FIXED QA Contact:
Severity: 3 - Medium Impact    
Priority: --- CC: d.krause
Version: 17.11.x   
Hardware: Linux   
OS: Linux   
Site: Jülich Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 17.11 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Chrysovalantis Paschoulas 2017-11-13 10:56:15 MST
The correct way to specify jobpacks in a jobscript it to use "#SBATCH packjob" in-between #SBATCH directives of different job packs.

But when I tried something wrong, like:
"#SBATCH -N X <options> : -N Y <options>"
then jobs silently didn't do anything (no stdout/err) and they went to COMPLETED state, instead of FAILED state!
And the nodes where the jobs run went offline/drained without any notice (nostdout/err files of messages in stdout/err).

This is not a good handling of such cases.
Comment 4 Felip Moll 2017-11-15 09:01:30 MST
Fix has been applied to Slurm 17.11 branch in commit ca2db47e569a680dc8593822e4f1f8f344c92a1b.

Now, an incorrect syntax in sbatch script causes a fatal error.

~]$ sbatch b4368bad.sh 
sbatch: fatal: b4368bad.sh: line 3: Unexpected `:` in [ : -n 1 -t 01:00:00]

Thanks for reporting that,
Felip M