Ticket 16666

Summary: #SBATCH --hint=nomultithread appears to break "#SBATCH --ntasks-per-node" in Slurm 23.02.1
Product: Slurm Reporter: Chris Samuel (NERSC) <csamuel>
Component: User CommandsAssignee: Marshall Garey <marshall>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: dmjacobsen
Version: 23.02.1   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=10620
Site: NERSC Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed: 23.02.3 23.11.0rc1
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Chris Samuel (NERSC) 2023-05-05 18:26:44 MDT
Hi there,

A user found that using --hint=nomultithread as an #SBATCH directive only gives them 1 task per node with Slurm 23.02.1 (I reduced to a test case as the original was more complicated):

#!/bin/bash
#SBATCH --ntasks-per-node=32
#SBATCH -c 1
#SBATCH -t 30
#SBATCH -C cpu
#SBATCH -N 6
#SBATCH --hint=nomultithread

srun hostname | sort | uniq -c

On our Shasta 23.0.2.1 systems it gives:

      1 nid001012
      1 nid001013
      1 nid001014
      1 nid001015
      1 nid001017
      1 nid001018

But that same script run on our XC test system with 22.05.8 gives the expected:

     32 nid00056
     32 nid00057
     32 nid00058
     32 nid00059
     32 nid00060
     32 nid00061

I tested and using `--hint=compute_bound` and `--hint=memory_bound` has the same outcome, but `--hint=multithread` works as desired with Slurm 23.02.1:

     32 nid001012
     32 nid001013
     32 nid001014
     32 nid001015
     32 nid001017
     32 nid001018

All the best,
Chris
Comment 7 Marshall Garey 2023-05-08 15:46:12 MDT
Hi Chris,

I can reproduce this. It is only a problem with sbatch. For a workaround, you can set --ntasks in the cli_filter or job_submit plugins:

ntasks = ntasks-per-node * nnodes
Comment 11 Marshall Garey 2023-05-16 13:59:43 MDT
This is fixed in commit c84e7dc2f1 ahead of 23.02.3. I'm closing this as fixed. Let me know if you have any more questions.