Ticket 16666 - #SBATCH --hint=nomultithread appears to break "#SBATCH --ntasks-per-node" in Slurm 23.02.1
Summary: #SBATCH --hint=nomultithread appears to break "#SBATCH --ntasks-per-node" in ...
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 23.02.1
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Marshall Garey
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-05-05 18:26 MDT by Chris Samuel (NERSC)
Modified: 2023-05-16 13:59 MDT (History)
1 user (show)

See Also:
Site: NERSC
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 23.02.3 23.11.0rc1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Chris Samuel (NERSC) 2023-05-05 18:26:44 MDT
Hi there,

A user found that using --hint=nomultithread as an #SBATCH directive only gives them 1 task per node with Slurm 23.02.1 (I reduced to a test case as the original was more complicated):

#!/bin/bash
#SBATCH --ntasks-per-node=32
#SBATCH -c 1
#SBATCH -t 30
#SBATCH -C cpu
#SBATCH -N 6
#SBATCH --hint=nomultithread

srun hostname | sort | uniq -c

On our Shasta 23.0.2.1 systems it gives:

      1 nid001012
      1 nid001013
      1 nid001014
      1 nid001015
      1 nid001017
      1 nid001018

But that same script run on our XC test system with 22.05.8 gives the expected:

     32 nid00056
     32 nid00057
     32 nid00058
     32 nid00059
     32 nid00060
     32 nid00061

I tested and using `--hint=compute_bound` and `--hint=memory_bound` has the same outcome, but `--hint=multithread` works as desired with Slurm 23.02.1:

     32 nid001012
     32 nid001013
     32 nid001014
     32 nid001015
     32 nid001017
     32 nid001018

All the best,
Chris
Comment 7 Marshall Garey 2023-05-08 15:46:12 MDT
Hi Chris,

I can reproduce this. It is only a problem with sbatch. For a workaround, you can set --ntasks in the cli_filter or job_submit plugins:

ntasks = ntasks-per-node * nnodes
Comment 11 Marshall Garey 2023-05-16 13:59:43 MDT
This is fixed in commit c84e7dc2f1 ahead of 23.02.3. I'm closing this as fixed. Let me know if you have any more questions.