(This is with Slurm 17.02.1-2) If a job has --output that contains "%x", "scontrol show job" will not substitute that with the job name. For instance: 413 (1) $ sbatch --wrap='sleep 60' -A nn9999k -t 1:0:0 -N 4 --output='%x.out' Submitted batch job 1437 414 (1) $ scontrol show job 1437 JobId=1437 JobName=wrap UserId=bhm(51568) GroupId=bhm(51568) MCS_label=N/A Priority=19940 Nice=0 Account=nn9999k QOS=nn9999k JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:08 TimeLimit=01:00:00 TimeMin=N/A SubmitTime=2017-04-11T14:41:52 EligibleTime=2017-04-11T14:41:52 StartTime=2017-04-11T14:41:52 EndTime=2017-04-11T15:41:52 Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=normal AllocNode:Sid=login-1-2:4143709 ReqNodeList=(null) ExcNodeList=(null) NodeList=c29-[2-5] BatchHost=c29-2 NumNodes=4 NumCPUs=128 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=128,mem=240G,node=4 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=60G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 Gres=(null) Reservation=(null) OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null) Command=(null) WorkDir=/cluster/nird/home/bhm/testjobs StdErr=/cluster/nird/home/bhm/testjobs/%x.out StdIn=/dev/null StdOut=/cluster/nird/home/bhm/testjobs/%x.out Power= We use scontrol show job in the prolog and epilog scripts to write info into the jobs stdout file. This is a minor issue, since it is easy to do the substitution ourself, but it would be nice not to have to do it. :)
There's no safe way to handle the format substitution there for all jobs, and as such I'm disinclined to change this. Keep in mind that %x is not the only format option available - %n %N in particular cannot be anticipated ahead of the job starting. If you don't mind me redirecting this, can you elaborate on how you're using this in the Prolog/Epilog? I've heard similar requests for ways to put some output in the batch job output automatically, and would be curious to better understand that. There might be a reasonable enhancement request we could address in 17.11 to better address that use case, but it'd help me if I better understood how that would work. If there was a version of a Prolog/Epilog script that (a) ran once per job, and (b) had its output inserted into the users' StdOut would that cover this? Say a new set of configuration options something like "BatchProlog/BatchEpilog"?
(In reply to Tim Wickberg from comment #1) > If you don't mind me redirecting this, can you elaborate on how you're using > this in the Prolog/Epilog? Sure, no problem: We are using it to print a header ("Starting job $SLURM_JOB_ID on $SLURM_NODELIST at $(date)") and a footer including output from sacct (for instance "sacct -j $SLURM_JOB_ID -o JobID,JobName,AllocCPUs,NTasks,MinCPU,MinCPUTask,AveCPU,Elapsed,ExitCode") into the stdout file. The idea is (amongst other) to make users more aware of how their jobs actually performed wrt. the resources they asked for it (like memory or walltime). Basically, we parse the output from "scontrol show job --oneliner $SLURM_JOB_ID" and put it into a bash array variable $job. Then we use: ---- snip ---- ## Do the stuff that should only be done once, on the head node: if [[ $SLURMD_NODENAME == ${job[BatchHost]} ]]; then ... other stuff ... ## Run epilog_slurmd.user for batch jobs (only): if [[ ${job[BatchFlag]} == 1 ]]; then export STDOUT_FILE=$(echo ${job[StdOut]} | sed "s/%x/${job[JobName]}/g") su "$SLURM_JOB_USER" -c /node/sbin/epilog_slurmd.user fi fi ---- snip ---- and epilog_slurmd.user does ----- snip ----- ## Make sure stdout file exists or is created with the right owner: if [[ ! -f $STDOUT_FILE ]]; then touch $STDOUT_FILE chown $USER_DOT_GROUP $STDOUT_FILE fi ## Append usage stats to stdout file: { echo echo Task and CPU usage stats: sacct -j $SLURM_JOB_ID -o JobID,JobName,AllocCPUs,NTasks,MinCPU,MinCPUTask,AveCPU,Elapsed,ExitCode ... more stuff ... } >> $STDOUT_FILE exit 0 # Needed in case directory of $STDOUT_FILE has been removed ---- snip ---- (And a similar setup for the prolog.) This is simplified a bit, of course. The setup is a bit finicky; a lot of small details that have to be taken care of, but it works fairly well. We used to have this implemented as a shell script that jobs were supposed to source at the start, and which used a shell trigger function to print out the footer, but users kept forgetting to source the file. :) > If there was a version of a Prolog/Epilog script that (a) ran once per job, > and (b) had its output inserted into the users' StdOut would that cover > this? Say a new set of configuration options something like > "BatchProlog/BatchEpilog"? That would definitely cover our needs, and make our prolog/epilog setup much simpler!
Remarking as an Enhancement request. No promises on when/if this may happen, although I'd like to see something in this vein done. There are a few architectural hurdles we need to discuss internally - for one, most common uses seem to be related to printing accounting records from 'sacct', and the final information isn't pushed there until after the Epilog finishes. So a hypothetical 'BatchEpilog' would need to either rely on some other source of accounting data, or be run after the traditional Epilog has completed.