Ticket 4368 - Wrong #SBATCH directives in jobscript lead to unpredictable behavior.
Summary: Wrong #SBATCH directives in jobscript lead to unpredictable behavior.
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Heterogeneous Jobs (show other tickets)
Version: 17.11.x
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Felip Moll
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-11-13 10:56 MST by Chrysovalantis Paschoulas
Modified: 2017-11-15 09:01 MST (History)
1 user (show)

See Also:
Site: Jülich
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 17.11
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Chrysovalantis Paschoulas 2017-11-13 10:56:15 MST
The correct way to specify jobpacks in a jobscript it to use "#SBATCH packjob" in-between #SBATCH directives of different job packs.

But when I tried something wrong, like:
"#SBATCH -N X <options> : -N Y <options>"
then jobs silently didn't do anything (no stdout/err) and they went to COMPLETED state, instead of FAILED state!
And the nodes where the jobs run went offline/drained without any notice (nostdout/err files of messages in stdout/err).

This is not a good handling of such cases.
Comment 4 Felip Moll 2017-11-15 09:01:30 MST
Fix has been applied to Slurm 17.11 branch in commit ca2db47e569a680dc8593822e4f1f8f344c92a1b.

Now, an incorrect syntax in sbatch script causes a fatal error.

~]$ sbatch b4368bad.sh 
sbatch: fatal: b4368bad.sh: line 3: Unexpected `:` in [ : -n 1 -t 01:00:00]

Thanks for reporting that,
Felip M