Dear Slurm developers, we are about to integrate a large number of AMD Rome nodes in our HPC system. For this, we have tested the SMD awareness of Slurm on another system based on AMD Naples. There we have run into an inconsistency between srun and sbatch when it comes to placing tasks on SMT cores... Thank you, Ulf # Inconsitent Handling of SMT-Capabilities **Problem:** Slurm does not allow to submit batch files allocating Simultaneous MultiThreading (SMT) capabilities on an AMD EPYC 7601 system. On the other hand, interactive allocation using `srun` works. The target system is an HPC system with two AMD EPPYC 7601 processor per node, each processor provides 32 cores and SMT will give 128 tasks in total per node. SMT is enabled/disabled by `--hint=multithreading|nomultihreading` and disabled by default. ## SMT with srun This works: ``` $ srun --nodes=1 --tasks-per-node=128 --hint=multithread echo hi | grep hi | wc -l 128 ``` ## SMT with SBATCH Slurm does not accept the following job ``` #SBATCH -A hpcsupport #SBATCH -J multi128 #SBATCH --hint=multithread #SBATCH -N 1 #SBATCH --tasks-per-node=128 srun echo hi | grep hi | wc -l ``` complaining about ``` $ sbatch multithread.batch sbatch: defined options sbatch: -------------------- -------------------- sbatch: account : hpcsupport sbatch: hint : multithread sbatch: nodes : 1 sbatch: ntasks : 128 sbatch: verbose : 1 sbatch: -------------------- -------------------- sbatch: end of defined options sbatch: Consumable Resources (CR) Node Selection plugin loaded with argument 4372 sbatch: select/cons_tres loaded with argument 4372 sbatch: Linear node selection plugin loaded with argument 4372 sbatch: Cray/Aries node selection plugin loaded sbatch: error: Batch job submission failed: Requested node configuration is not available ``` ### Oberservations We played with various settings, but had no success: **Overcommit** The job is only accepted by adding `#SBATCH --overcommit`. But the resulting scheduling will not make use of the SMT-cores. This is expected behaviour. **Exclusive** Things don't change by providing `--hint=multithread` when submitting the batch file: ``` $ sbatch --hint=multithread multithread.batch ``` **SLURM_HINT** Setting the value of SLURM_HINT to multithread via export command does not help. Moreover, the option `--hint=multithread` does not change the env. variable SLUM_HINT in any case. # slurm.conf Here the snippet ouf our slurm.conf: SelectType=select/cons_res SelectTypeParameters=CR_ONE_TASK_PER_CORE,CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK ProctrackType=proctrack/linuxproc TaskPlugin=task/affinity TaskPluginParam=Cpusets,Autobind=Threads NodeName=n1 Procs=128 Sockets=2 CoresPerSocket=32 ThreadsPerCore=2 FEATURE=mem256gb,noboost,booston,ghz-2.2,mhz-2200,mhz-1700,mhz-1200 RealMemory=250000 Weight=256
Ulf, Is this for a system that is being supported by Atos? If so, could you have Atos submit this issue? Most of this might be fixable through configuration options. Due to contract limitations this is the route we have to take. Jacob
Created attachment 12433 [details] fix assignemnt of INFINITE16 to ntasks_per_node(v1) Ulf, I can reproduce it. The issue comes from inconsistent size used by sbatch/srun to internally handle ntasks-per-core, which in case of --hint=multithread defaults to "infinite". Could you please apply the attached patch and verify if this eliminates the issue for you? Alternatively, you can explicitly specify --ntasks-per-core (--ntasks-per-core=2 will be enough in this case) or overwrite it in job_submit plugin. cheers, Marcin
I am out of office until December 1, 2019. /* For support questions please contact hpcsupport@zih.tu-dresden.de . */ Kind regards, Ulf Markwardt
Comment on attachment 12439 [details] fix assignemnt of INFINITE16 to ntasks_per_node(v2) Ulf, Were you able to apply and verify the patch from comment 4 ? cheers, Marcin
We have tested it today. The behavior is still the same :-( Best, Ulf
We have tested the patch: situation unchanged. With --ntasks-per-core=2 jobs are accepted.
Could you please double check if the slurm was fully rebuilt and installed from new build with the patch applied and you're using new sbatch? If yes please execute the following commands: ls -l $(which sbatch) #gdb $(which sbatch) (gdb) break proc_args.c:872 (gdb) run --hint='multithread' --wrap='sleep 100' (gdb) n (gdb) print *ntasks_per_core and share the full output with us. cheers, Marcin
Dear Slurm developers, sorry for the long delay. We checked that slurm is rebuilt with the patch. The patch fixes the issue partly only. We see the following behavior: 1. #SBATCH Directive If "#SBATCH --hint=multithread" is specified within a jobfile, the job is rejected with "sbatch: error: Batch job submission failed: Requested node configuration is not available" 2. Commandline Argument Submitting the job via "sbatch --hint=multithread jobfile.sh" works and gives all core (incl. SMT). 3. Env. Variable SLURM_HINT Last, we expirimented with the env. variable SLURM_HINT. While the submission using the combination "unset SLURM_HINT" and "SBATCH --hint=multithread" is rejected, it works with explicitly setting the value of SLURM_HINT via "export SLRUM_HINT=multithread". Best Ulf
Hi Ulf, >1. #SBATCH Directive [...] I'm tryint to reproduce it with a script like the one from comment0: ># cat /tmp/testHT >#!/bin/bash >#SBATCH --hint=multithread >#SBATCH -N 1 >#SBATCH --tasks-per-node=128 >srun echo hi using unpatched sbatch: # /mnt/slurm/bin/sbatch /tmp/testHT sbatch: error: Batch job submission failed: Requested node configuration is not available using patched sbatch: # sbatch /tmp/testHT Submitted batch job 122 # grep hi slurm-122.out | wc -l 128 Important slurm.conf parameters on my side: # grep SelectTypePa /mnt/slurm/etc/slurm.conf | grep -v ^# SelectTypeParameters=CR_ONE_TASK_PER_CORE,CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK # grep SelectTy /mnt/slurm/etc/slurm.conf | grep -v ^# SelectType=select/cons_res SelectTypeParameters=CR_ONE_TASK_PER_CORE,CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK # grep NodeName= /mnt/slurm/etc/slurm.conf | grep -v ^# NodeName=test02 NodeHostName=slurmctl CPUs=128 CoreSpecCount=0 Sockets=2 CoresPerSocket=32 ThreadsPerCore=2 State=UNKNOWN Is our configuration and jobscript aligned(parameters and theris order)? Do you have any job_submit or cli_filter plugins potentially affecting job description? >2. Commandline Argument [...] Just to be sure - this looks fine for you? >3. Env. Variable SLURM_HINT It's probably something I didn't fully explain after your initial message. SLURM_* variables are output variables in terms of sbatch and salloc, so when you have a job submitted with sbatch --hint=X SLURM_HINT will be set in the job environment. Those are input variables for srun so srun inside a batch script will "inherit" --hint by default (if it's not unset explicitelly before execution of srun). At the same time all SLURM_* variables are exported to job environemnt, so if you export SLURM_HINT srun will get it even if sbatch was called without --hint - this is happening in this case. When srun is called inside a job allocation it will create a step in this allocation, so any option here can't have impact on selection of cores (i.e. slurmctld select plugin activity), but can change TaskPlugin behavior - task affinity. In those terms --hint is a little bit special since depending on the context it affects both or only task affinity, what it does for sbatch/salloc is: if both --ntasks-per-core and --threads-per-core are not specified but --hint=multithreaded is used set --ntasks-per-core=infinite and set SLURM_HINT output variable. This variable will be interpreted by srun which will result in --cpu-bind=threads and removal of CR_ONE_TASK_PER_CORE. Are you mixing --hint is that possible that you're mixing --threads-per-core/--ntasks-per-core in your job script from point 1? cheers, Marcin
Ulf, The patch for the issue was merged[1] into slurm-20.02 branch and will be part of 20.02.2 release. I'm closing this now. Should you have any question please reopen. cheers, Marcin [1]https://github.com/SchedMD/slurm/commit/e5d9b71bebbeea956997cebd01bf693a1b294b62