| Summary: | Strange CPU binding with --ntasks-per-node set | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | David Gloe <david.gloe> |
| Component: | slurmd | Assignee: | Danny Auble <da> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | brian, da |
| Version: | 14.11.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | CRAY | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 14.03.9 14.11.0rc2 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
David Gloe
2014-10-03 06:51:58 MDT
David, I haven't been able to reproduce this yet on a normal cluster (it just works there), but I will try next week on a Cray. srun -n 4 --ntasks-per-node=4 --cpu_bind=v whereami cpu_bind=MASK - snowflake, task 2 2 [16569]: mask 0x2 set cpu_bind=MASK - snowflake, task 0 0 [16567]: mask 0x1 set cpu_bind=MASK - snowflake, task 1 1 [16568]: mask 0x10 set cpu_bind=MASK - snowflake, task 3 3 [16570]: mask 0x20 set 2 snowflake0 - MASK:0x2 1 snowflake0 - MASK:0x10 3 snowflake0 - MASK:0x20 0 snowflake0 - MASK:0x1
Hi David,
we can reproduce it on our 2 sockets machine as well and investigating.
David
I see what is happening here. I'll see if I can get a fix for it tomorrow. It appears the --ntasks-per-node option lays tasks out differently than expected, or requested. This is fixed in commit 03dc6ea7800. Please reopen if you still see issues. This is fixed in commit 03dc6ea7800. Please reopen if you still see issues. |