Ticket 16045 - srun does not read thread binding from sbatch
Summary: srun does not read thread binding from sbatch
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other tickets)
Version: 22.05.0
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-02-15 12:58 MST by antoine.jego
Modified: 2023-02-15 12:58 MST (History)
0 users

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
Reproducing script (585 bytes, application/x-shellscript)
2023-02-15 12:58 MST, antoine.jego
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description antoine.jego 2023-02-15 12:58:05 MST
Created attachment 28876 [details]
Reproducing script

When submitting a script through `sbatch` with specific options (e.g. number of tasks), calling `srun` does not pass these options. They can however be passed by hinting specifically at them.

This can cause hwloc to mess up thread bindings.

An attached script can help reproduce the issue. It should be modified to target the running architecture (the partition I use has 36 cores/node).
The attached script outputs the following

srun                  
0x0000000f,0xffffffff 
0x0000000f,0xffffffff 
0x0000000f,0xffffffff 
srun hint             
0x0000000f,0xffffffff 
0x0000000f,0xffffffff 
0x0000000f,0xffffffff 
srun hint ++ 3, 12    
0x00555555            
0x0000000f,0xff000000 
0x00aaaaaa            

The only correct binding is the last one where all hints have been given. I would expect this correct behaviour to occur even when no hints are passed.