Ticket 14346

Summary: Recommendation for sbatch job config
Product: Slurm Reporter: Shaheer KM <shaheer>
Component: ConfigurationAssignee: Chad Vizino <chad>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: chad
Version: 23.02.x   
Hardware: Linux   
OS: Linux   
Site: Cerebras Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Shaheer KM 2022-06-20 09:31:43 MDT
Hello,

We have been running our workload using below srun command

srun --unbuffered --kill-on-bad-exit --nodes=1 --tasks-per-node=1 --esclusive  : --nodes=3 --tasks-per-node=8 --cpus-per-task=16 --exclusive python run.py -p configs/params.yaml 

This is expected to trigger 1 task in node and 24 tasks across other nodes in parallel. Total 25 tasks for the job.

We would like to start using sbatch to do the same for our workflow. Is there a recommended solution you all can suggest for this? And is this a supported use case at all with sbatch?

I had looked up sbatch het job and that says the jobs will be run one after the other. Thats not what we want in out case.

Any help here is really appreciated. 

Thanks..
Comment 1 Chad Vizino 2022-06-20 14:46:26 MDT
(In reply to Shaheer KM from comment #0)

Hi. This is possible using sbatch but sbatch will not block and wait at the command prompt for the execution to finish like srun will if you are running that on the command line but maybe you know this and are ok with this behavior--just wanted to point it out.

If I understand your use case correctly, you should be able to just run the srun within a batch job like this (or putting the srun in a job file and passing the file name to sbatch):

>sbatch -N1 --ntasks-per-node=1 --exclusive : -N3 --ntasks-per-node=8 --cpus-per-task=16 --exclusive --wrap="srun --unbuffered --kill-on-bad-exit --nodes=1 --ntasks-per-node=1 --exclusive  : --nodes=3 --ntasks-per-node=8 --cpus-per-task=16 --exclusive --wrap="srun : python run.py -p configs/params.yaml"
As for starting het jobs, all components will start together on resources when they are available. However, certain factors can influence the component jobs' position in the scheduling queue than can affect the overall het job start. A couple of them are explained here:

>https://slurm.schedmd.com/slurm.conf.html#OPT_bf_hetjob_immediate
>https://slurm.schedmd.com/slurm.conf.html#OPT_bf_hetjob_prio=[min|avg|max]
Take a look and let me know if you have more questions or if I misinterpreted your question.
Comment 2 Shaheer KM 2022-06-20 18:09:47 MDT
Thanks for the input. I tried this and it does not seem to work the way our application needs looks like. Our application relies on slurm env variables to decide roles for each task that gets spun up as part of job and with sbatch heterogeneous job this seems to be not working. 

Our application looks at SLURM_JOB_NODELIST and with sbatch hetro job all nodes wont be listed under this env var.
Comment 3 Chad Vizino 2022-06-21 11:27:42 MDT
(In reply to Shaheer KM from comment #2)
> Thanks for the input. I tried this and it does not seem to work the way our
> application needs looks like. Our application relies on slurm env variables
> to decide roles for each task that gets spun up as part of job and with
> sbatch heterogeneous job this seems to be not working. 
> 
> Our application looks at SLURM_JOB_NODELIST and with sbatch hetro job all
> nodes wont be listed under this env var.
That env var should still be available in both the job script for sbatch and in the environment of srun and what it starts. But, there is another one that holds the node list for each het group (component). So, you might also look at using SLURM_PROCID and SLURM_JOB_NODELIST_HET_GROUP_<N> where <N> is the value of SLURM_PROCID and holds the node list for the N+1 het component job. Example:

>$ cat het_example
>#!/bin/bash
>#set -x
>tmp=SLURM_JOB_NODELIST_HET_GROUP_$SLURM_PROCID
>nodelist=${!tmp}
>echo "$SLURM_PROCID ($SLURMD_NODENAME): $tmp=$nodelist"
> 
>$ sbatch : --wrap="srun : /tmp/het_example"
>Submitted batch job 164527
> 
>$ cat slurm-164527.out 
>1 (mackinac-2): SLURM_JOB_NODELIST_HET_GROUP_1=mackinac-2
>0 (mackinac-1): SLURM_JOB_NODELIST_HET_GROUP_0=mackinac-1
Does this help you?
Comment 4 Chad Vizino 2022-06-30 16:32:57 MDT
(In reply to Chad Vizino from comment #3)
> (In reply to Shaheer KM from comment #2)
> > Thanks for the input. I tried this and it does not seem to work the way our
> > application needs looks like. Our application relies on slurm env variables
> > to decide roles for each task that gets spun up as part of job and with
> > sbatch heterogeneous job this seems to be not working. 
> > 
> > Our application looks at SLURM_JOB_NODELIST and with sbatch hetro job all
> > nodes wont be listed under this env var.
> That env var should still be available in both the job script for sbatch and
> in the environment of srun and what it starts. But, there is another one
> that holds the node list for each het group (component). So, you might also
> look at using SLURM_PROCID and SLURM_JOB_NODELIST_HET_GROUP_<N> where <N> is
> the value of SLURM_PROCID and holds the node list for the N+1 het component
> job. Example:
> 
> >$ cat het_example
> >#!/bin/bash
> >#set -x
> >tmp=SLURM_JOB_NODELIST_HET_GROUP_$SLURM_PROCID
> >nodelist=${!tmp}
> >echo "$SLURM_PROCID ($SLURMD_NODENAME): $tmp=$nodelist"
> > 
> >$ sbatch : --wrap="srun : /tmp/het_example"
> >Submitted batch job 164527
> > 
> >$ cat slurm-164527.out 
> >1 (mackinac-2): SLURM_JOB_NODELIST_HET_GROUP_1=mackinac-2
> >0 (mackinac-1): SLURM_JOB_NODELIST_HET_GROUP_0=mackinac-1
> Does this help you?

Hi. I'll plan to close this issue shortly unless you have further questions--feel free to ask.
Comment 6 Chad Vizino 2022-06-30 17:05:49 MDT
Closing for now. Feel free to reopen if you have more questions.