Ticket 2404

Summary: Proper use of "sbatch --bb=" command syntax
Product: Slurm Reporter: David Paul <dpaul>
Component: Burst BuffersAssignee: Tim Wickberg <tim>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: dpaul, tim
Version: 15.08.7   
Hardware: Linux   
OS: Linux   
Site: NERSC Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 17.02-pre1 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description David Paul 2016-02-02 03:02:05 MST
Could you provide more detail on the use of sbatch with the "--bb=" switch?

In this form the command simply hangs:

sbatch --bb="create_persistent name=dpaul50T capacity=50TB access=striped type=scratch"

A user reported their command was accepted (no details on command line) but no BB allocation was created.

Thanks.
Comment 1 Tim Wickberg 2016-02-02 03:16:06 MST
You still need to provide a batch script - if they hit enter with that command line you gave the sbatch command is expecting the script file to be provided on stdin (and terminated with Ctrl-D).

I'm guessing they got a blank line back on the terminal (which was sbatch listening on stdin), then hit Ctrl-C which cancelled the request - so no request would have been sent to slurmctld.

If they don't want to create an empty job script to give as an argument, you can use --wrap "" as an argument like so:

sbatch --bb="create_persistent name=dpaul50T capacity=50TB access=striped type=scratch" --wrap ""
Comment 2 Tim Wickberg 2016-02-08 08:07:48 MST
Did you have any further questions on this, or can I go ahead and mark this as resolved?

- Tim
Comment 3 Tim Wickberg 2016-02-08 08:13:14 MST
Did you have any further questions on this, or can I go ahead and mark this as resolved?

- Tim
Comment 4 Tim Wickberg 2016-02-11 07:58:01 MST
Marking as resolved/infogiven.
Comment 5 David Paul 2016-02-12 05:01:03 MST
Sorry for the delay replying.

The command does not create the persistent reservation.

[dpaul@cori03]==> sbatch --bb="create_persistent name=dpaul200GB capacity=200GB access=striped type=scratch" --wrap ""
Submitted batch job 1132507

[dpaul@cori03]==> squeue -l -u dpaul
Fri Feb 12 10:56:35 2016
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
           1132507     debug     wrap    dpaul  RUNNING       0:01     10:00      1 nid00092

[dpaul@cori03]==> scontrol show job 1132507
JobId=1132507 JobName=wrap
   UserId=dpaul(15448) GroupId=dpaul(1015448)
   Priority=28929 Nice=0 Account=mpccc QOS=premium
   JobState=COMPLETED Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:04 TimeLimit=00:10:00 TimeMin=N/A
   SubmitTime=2016-02-12T10:55:00 EligibleTime=2016-02-12T10:55:00
   StartTime=2016-02-12T10:56:34 EndTime=2016-02-12T10:56:38
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=debug AllocNode:Sid=cori03:33651
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=nid00092
   BatchHost=nid00092
   NumNodes=1 NumCPUs=64 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=64,mem=124928,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=122G MinTmpDiskNode=0
   Features=(null) Gres=craynetwork:1 Reservation=(null)
   Shared=0 Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=/global/u1/d/dpaul
   StdErr=/global/u1/d/dpaul/slurm-1132507.out
   StdIn=/dev/null
   StdOut=/global/u1/d/dpaul/slurm-1132507.out
   Power= SICP=0
Comment 6 Tim Wickberg 2016-02-12 06:41:12 MST
Looks like my understanding of the --bb option was incomplete, sorry about that.

The option parser is handling the line given in --bb="$FOO" as if it were a line from an sbatch file, and is looking for the #BB or #DW characters at the start.

So, this should work:

sbatch --bb="#BB create_persistent name=dpaul200GB capacity=200GB access=striped type=scratch" --wrap ""

The --wrap "" doesn't impact anything - if you'd submitted a script you still wouldn't have gotten the persistent buffer.

Slurm should warn about the invalid --bb argument - the option parser currently ignores any line not starting with a # but doesn't return an error which is a bug.

There appear to be some other quirks in how the --bb argument works compared to placing directives in the job script, I'm looking into this further.
Comment 7 Tim Wickberg 2016-02-12 06:58:39 MST
Actual, on further review the "sbatch --bb " option is ignored completely. With a leading # or not, the argument is thrown away before we parse it.

salloc and srun do support --bb with the #BB format as described.

At the moment, this would get create your buffer as intended:

srun --bb="#BB create_persistent name=dpaul200GB capacity=200GB access=striped type=scratch" date

I'm looking a fix for sbatch now.
Comment 12 Moe Jette 2016-04-11 08:47:16 MDT
David,

Tim and I exchanged a few ideas about this some time ago. The --bb/-bbf options should work fine for salloc and srun. The sbatch command is more complex as the user can specify conflicting options in the job script and on the command line (via --bb/bbf options).

Here are a couple of ideas:
1. Disable --bb/bbf options for the sbatch command and force users to specify options in the script
2. Try to merge command line options with those in the script, which seems fraught with peril and provides little real benefit.

Comments?
Comment 13 Moe Jette 2016-07-27 14:43:52 MDT
(In reply to Moe Jette from comment #12)
> David,
> 
> Tim and I exchanged a few ideas about this some time ago. The --bb/-bbf
> options should work fine for salloc and srun. The sbatch command is more
> complex as the user can specify conflicting options in the job script and on
> the command line (via --bb/bbf options).
> 
> Here are a couple of ideas:
> 1. Disable --bb/bbf options for the sbatch command and force users to
> specify options in the script
> 2. Try to merge command line options with those in the script, which seems
> fraught with peril and provides little real benefit.

This is what I've done:
1. In version 16.05, documented that the --bb option can NOT be used to create or destroy persistent burst buffers. I've also added logic return an error if someone tries to create or destroy persistent burst buffer using the -bb option so that it is more clear what is happening. Note that the --bbf option works for salloc, srun and sbatch to create or destroy persistent burst buffers.
2. In version 17.02, remove the sbatch --bb option, which does not work in any version as far as I can tell.
3. In version 17.02, added the sbatch --bbf option, which will merge the file specified with the --bbf option into user's script.

I believe this is probably the best way to address the problem you have reported here.
Comment 14 Tim Wickberg 2016-10-12 13:57:15 MDT
Marking this as closed. Moe's outlined out approach to handling this with Comment 13.