| Summary: | srun does not read thread binding from sbatch | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | antoine.jego |
| Component: | Scheduling | Assignee: | Jacob Jenson <jacob> |
| Status: | OPEN --- | QA Contact: | |
| Severity: | 6 - No support contract | ||
| Priority: | --- | ||
| Version: | 22.05.0 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | -Other- | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: | Reproducing script | ||
Created attachment 28876 [details] Reproducing script When submitting a script through `sbatch` with specific options (e.g. number of tasks), calling `srun` does not pass these options. They can however be passed by hinting specifically at them. This can cause hwloc to mess up thread bindings. An attached script can help reproduce the issue. It should be modified to target the running architecture (the partition I use has 36 cores/node). The attached script outputs the following srun 0x0000000f,0xffffffff 0x0000000f,0xffffffff 0x0000000f,0xffffffff srun hint 0x0000000f,0xffffffff 0x0000000f,0xffffffff 0x0000000f,0xffffffff srun hint ++ 3, 12 0x00555555 0x0000000f,0xff000000 0x00aaaaaa The only correct binding is the last one where all hints have been given. I would expect this correct behaviour to occur even when no hints are passed.