Ticket 4368

Summary: Wrong #SBATCH directives in jobscript lead to unpredictable behavior.
Product: Slurm Reporter: Chrysovalantis Paschoulas <c.paschoulas>
Component: Heterogeneous JobsAssignee: Felip Moll <felip.moll>
Status: RESOLVED FIXED QA Contact:
Severity: 3 - Medium Impact    
Priority: --- CC: d.krause
Version: 17.11.x   
Hardware: Linux   
OS: Linux   
Site: Jülich Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed: 17.11
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Chrysovalantis Paschoulas 2017-11-13 10:56:15 MST
The correct way to specify jobpacks in a jobscript it to use "#SBATCH packjob" in-between #SBATCH directives of different job packs.

But when I tried something wrong, like:
"#SBATCH -N X <options> : -N Y <options>"
then jobs silently didn't do anything (no stdout/err) and they went to COMPLETED state, instead of FAILED state!
And the nodes where the jobs run went offline/drained without any notice (nostdout/err files of messages in stdout/err).

This is not a good handling of such cases.
Comment 4 Felip Moll 2017-11-15 09:01:30 MST
Fix has been applied to Slurm 17.11 branch in commit ca2db47e569a680dc8593822e4f1f8f344c92a1b.

Now, an incorrect syntax in sbatch script causes a fatal error.

~]$ sbatch b4368bad.sh 
sbatch: fatal: b4368bad.sh: line 3: Unexpected `:` in [ : -n 1 -t 01:00:00]

Thanks for reporting that,
Felip M