Ticket 2198

Summary: sbatch submission failures for very large bash scripts
Product: Slurm Reporter: Marios Hadjieleftheriou <marioh>
Component: slurmctldAssignee: Tim Wickberg <tim>
Status: RESOLVED FIXED QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 14.11.7   
Hardware: Linux   
OS: Linux   
Site: Lion Cave Capital Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed: 15.08.4 16.05.0-pre1
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: sbatch file

Description Marios Hadjieleftheriou 2015-11-27 13:05:45 MST
Created attachment 2457 [details]
sbatch file

We are submitting very large bash files using sbatch (7 MB), and we get error:
sbatch: error: Batch job submission failed: Pathname of a file, directory or other parameter too long

I am attaching a script.
Comment 1 Tim Wickberg 2015-11-27 13:36:35 MST
There's a 4MB size limit for job scripts to prevent people from overwhelming slurmctld. 

We recommend splitting whatever logic or data structures that are being backed into the job script to a separate script, and keeping the job script itself short.

But you're not the first customer to run into this issue - release 15.08.4 added a new max_script_size value for the SchedulerParameters option in slurm.conf that you can adjust to suit. There's also a related fix in that release which improves sbatch submission performance - it may be taking ~20min to submit your 4MB test file at the moment. If you're looking at routinely processing such large scripts we'd encourage you to update. (Alternatively, the setting the SBATCH_IGNORE_PBS environment variable or using the --ignore-pbs command line switch to sbatch should immediately fix this performance defect.)

- Tim