Ticket 14223 - Binding problem with hyperthreads
Summary: Binding problem with hyperthreads
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 20.11.8
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Oscar Hernández
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-06-02 05:41 MDT by Regine Gaudin
Modified: 2022-07-05 04:52 MDT (History)
1 user (show)

See Also:
Site: CEA
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Regine Gaudin 2022-06-02 05:41:59 MDT
Hello

We can not get correct binding on hyperthreads nodes. We want to offer to jobs the possibility to allocate physical cores with their hyperthreads

compute node: 
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              256
On-line CPU(s) list: 0-255
Thread(s) per core:  2
Core(s) per socket:  64
Socket(s):           2
NUMA node(s):        4

NUMA node0 CPU(s):   0-31,128-159
NUMA node1 CPU(s):   32-63,160-191
NUMA node2 CPU(s):   64-95,192-223
NUMA node3 CPU(s):   96-127,224-255

1) cons_res, task/affinity plugin
SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE

controller slurm] # /tmp/donne-conf.sh
SelectType=select/cons_res
TaskPlugin=task/cgroup,task/affinity
NodeName=inti7800 Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=240000  Gres=gpu:nvidia:4 State=UNKNOWN

compute node slurm] # /tmp/done-conf-slurm.sh
SelectType=select/cons_res
TaskPlugin=task/cgroup,task/affinity
NodeName=inti7800 Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=240000  Gres=gpu:nvidia:4 State=UNKNOWN
TaskAffinity=no in cgroup.conf

 $ srun -n 1 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128

 $  srun -n 2 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128
Cpus_allowed_list:      1,129

 srun -n 1 -c 1  -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128

 $ srun -n 1 -c 2 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128

0,1,128,129 expected according to result obtained with -c 1

 $ srun -n 1 -c 2 --exclusive -p a100-bxi cat /proc/self/status|grep 
Cpus_allowed_list:      0,128

 $  srun -n 1 -c 3 -p a100-bxi cat /proc/self/status|grep Cpus_allowe
Cpus_allowed:   00000000,00000000,00000000,00000003,00000000,00000000,00000000,00000003
Cpus_allowed_list:      0-1,128-129

  0-2,128-130 expected according to result obtained with -c 1

 $  srun -n 1 -c 4  -p a100-bxi cat /proc/self/status|grep Cpus_allowe
Cpus_allowed:   00000000,00000000,00000000,00000003,00000000,00000000,00000000,00000003
Cpus_allowed_list:      0-1,128-129

$ srun -n 2  -c 2 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128
Cpus_allowed_list:      1,129

 $ srun -n 2  -c 2 --exclusive -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128
Cpus_allowed_list:      1,129

 $ srun -n 2  -c 3 --exclusive -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-1,128
Cpus_allowed_list:      0,2,130

 $ srun -n 2  -c 3  -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-3,128-131
Cpus_allowed_list:      0-3,128-131

2) cons_res, cgroup taskaffinity 
SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE

controller slurm] #  /tmp/donne-conf.sh
SelectType=select/cons_res
TaskPlugin=task/cgroup
NodeName=inti7800 Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=240000  Gres=gpu:nvidia:4 State=UNKNOWN

compute slurm] # /tmp/done-conf-slurm.sh
SelectType=select/cons_res
TaskPlugin=task/cgroup
#TaskPluginParam=Cpusets,Autobind=Threads
NodeName=inti7800 Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=240000  Gres=gpu:nvidia:4 State=UNKNOWN
TaskAffinity=yes  in cgroup.conf


 $ srun -n 1 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128

 $  srun -n 2 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128
Cpus_allowed_list:      1,129

$  srun -n 1 -c 1  -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128

 $ srun -n 1 -c 2 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128

if -c 1 gives one physical core with 2 hyperthreads 
   -c 2 should give  0,1,128,129

 $ srun -n 1 -c 2 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128
 $ srun -n 1 -c 2 --exclusive -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128
$ srun -n 1 -c 2 --exclusive=user -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128

 $  srun -n 1 -c 3 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-1,128-129
$  srun -n 1 -c 3  --exclusive  -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-1,128-129
 $ srun -n 1 -c 4  -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-1,128-129
] $ srun -n 1 -c 4 --exclusive  -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-1,128-129
 $ srun -n 1 -c 6 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
slurmstepd-inti7800: error: task[0] unable to set taskset '0x0000003f,,,,0x0000003f'
Cpus_allowed_list:      64-66,192-194
 $ srun -n 1 -c 6 --exclusive -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-2,128-130



3) cons_res, task/affinity , numa description in nodes.conf
SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE

compute slurm] # /tmp/done-conf-slurm.sh
SelectType=select/cons_res
TaskPlugin=task/cgroup,task/affinity
NodeName=inti7800 Sockets=4 CoresPerSocket=32 ThreadsPerCore=2 RealMemory=240000  Gres=gpu:nvidia:4 State=UNKNOWN
TaskAffinity=no   in cgroup.conf

controller /tmp/donne-conf.sh
SelectType=select/cons_res
TaskPlugin=task/cgroup,task/affinity
NodeName=inti7800 Sockets=4 CoresPerSocket=32 ThreadsPerCore=2 RealMemory=240000  Gres=gpu:nvidia:4 State=UNKNOWN

 $ srun -n 1 -c 1  -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-255
 $  srun -n 1 -c 2 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-63,128-191[gaudinr@inti7800 gaudinr] $  srun -n 2  -c 2 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-255
Cpus_allowed_list:      0-255
 $ srun -n 2  -c 3 --exclusive -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-1,128
Cpus_allowed_list:      0,2,130
 $ srun -n 2  -c 2 --exclusive  -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128
Cpus_allowed_list:      1,129
[gaudinr@inti7800 gaudinr] $ srun -n 1 --exclusive -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128
[gaudinr@inti7800 gaudinr] $  srun -n 2  --exclusive -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0
Cpus_allowed_list:      0
 $  srun -n 1 --hint=nomultithread -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0
 $  srun -n 2 --hint=nomultithread -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0
Cpus_allowed_list:      1
 $  srun -n 2 -c 1 --hint=nomultithread -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0
Cpus_allowed_list:      1
 $  srun -n 2 -c 3 --hint=nomultithread -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-2
Cpus_allowed_list:      3-5
 $  srun -n 2 -c 3 --hint=nomultithread --exclusive -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-2
Cpus_allowed_list:      3-5
i7800 gaudinr] $ srun -n 1 -c 1  -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-255
 $  srun -n 1 -c 2 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-63,128-191
$  srun -n 2  -c 2 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-255
Cpus_allowed_list:      0-255
 $ srun -n 2  -c 3 --exclusive -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-1,128
Cpus_allowed_list:      0,2,130
 $ srun -n 2  -c 2 --exclusive  -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128
Cpus_allowed_list:      1,129
 $ srun -n 1 --exclusive -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0,128
 $  srun -n 2  --exclusive -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0
Cpus_allowed_list:      0

--hint=nomultithread is working but we want to offer the possibility to allocate the hyperthreads:
 $  srun -n 1 --hint=nomultithread -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0
 $  srun -n 2 --hint=nomultithread -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0
Cpus_allowed_list:      1
 $  srun -n 2 -c 1 --hint=nomultithread -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0
Cpus_allowed_list:      1
 $  srun -n 2 -c 3 --hint=nomultithread -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-2
Cpus_allowed_list:      3-5
 $  srun -n 2 -c 3 --hint=nomultithread --exclusive -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-2
Cpus_allowed_list:      3-5
Comment 1 Oscar Hernández 2022-06-03 08:45:05 MDT
Dear Regine,

I'll try to bring some context here. 

Slurm will never allocate to a job a single thread (when in a multithreaded cluster). As it would be highly inefficient having 2 independent jobs using the same physical core. See [1] "The count of CPUs allocated to a job may be rounded up to account for every CPU on an allocated core"


When setting SelectTypeParameters you are configuring how slurm will handle Accounting/scheduling (cores allocatable/used), but not the binding to tasks. The relevant SelectTypeParameters parameters you have: 

CR_Core_Memory -> Count each thread as a CPU to slurm. Also account memory.
CR_ONE_TASK_PER_CORE -> Limits the maximum number of tasks to the physical cores of the machine (as each task is accounted 2 threads). So, in your case, your maximum tasks per node is 128.

With this configuration you are not defining the binding to tasks and srun gets all resources allocated.

Taking into account that, let me answer some of your questions in-line:

>  $ srun -n 1 -c 2 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
> Cpus_allowed_list:      0,128
> 
> 0,1,128,129 expected according to result obtained with -c 1


The minimum allocation slrum grants is a physical CPU, so it -c1 and -c2 will get you the same, 1 physical, 2 logical.

>  $  srun -n 1 -c 3 -p a100-bxi cat /proc/self/status|grep Cpus_allowe
> Cpus_allowed:  
> 00000000,00000000,00000000,00000003,00000000,00000000,00000000,00000003
> Cpus_allowed_list:      0-1,128-129
> 
>   0-2,128-130 expected according to result obtained with -c 1

Here it gets 2 physical, 4 logical, necessary to allocate 3 threads. Remember [1] that we need increase cpus in groups of 2.

2nd conf has similar behavior as the 1st one. task/affinity makes no difference here, as you are not specifying any binding affinity in srun.

>  $ srun -n 1 -c 2 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
> Cpus_allowed_list:      0,128
> 
> if -c 1 gives one physical core with 2 hyperthreads 
>    -c 2 should give  0,1,128,129

Same situation as before, cpus go in groups of 2, so you get -c1 and -c2 get you the same.

>  $ srun -n 1 -c 6 -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
> slurmstepd-inti7800: error: task[0] unable to set taskset

Have not been able to reproduce the error. Was it a punctual thing? 

>  $ srun -n 1 -c 1  -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
> Cpus_allowed_list:      0-255

For the 3rd config, I see you are changing the node layout on purpose (which now is not reflecting the real hardware). This might be the reason for getting this strange cpulist.

> --hint=nomultithread is working but we want to offer the possibility to
> allocate the hyperthreads:

Have you tried --hint=multithread? I think it should give you what you want.

Apart from that, if you do not want to add the --hint to each job, you could also modify the node description in slurm.conf, adding [2] CpuBind=Thread. The line in your case should besomething like:

NodeName=inti7800 Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=240000  Gres=gpu:nvidia:4 State=UNKNOWN CpuBind=Thread

In addition, as you bind to Thread, if you want your node to allocate 256 tasks (one per thread) and avoid wasting half of the node, you should remove the CR_ONE_TASK_PER_CORE, which is limiting to 128. You can test it by running:

srun -n 256 -N 1 -c 1 cat /proc/self/status|grep Cpus_allowed_list

I guess that with CR_ONE_TASK_PER_CORE Slurm won't be able to satisfy allocation.

Let me know if I missed something, or you have any doubt on my comments.

Regards,
Oscar

[1] https://slurm.schedmd.com/slurm.conf.html#OPT_CR_Core_Memory
[2] https://slurm.schedmd.com/slurm.conf.html#OPT_CpuBind
Comment 2 Oscar Hernández 2022-06-09 01:23:03 MDT
Dear Regine,

Hope you managed to configure it the way you intended.

I will be closing this bug for now. Do not hesitate to re-open if any follow-up question arises.

Kind regards,
Oscar
Comment 3 Regine Gaudin 2022-06-09 03:32:01 MDT
Hi
why do we have this ?

$ srun -n 2  -c 3 --exclusive -p a100-bxi cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:      0-1,128
Cpus_allowed_list:      0,2,130
Comment 4 Oscar Hernández 2022-06-09 10:14:34 MDT
Hi Regine,

Let me apologize, I missed that one. It does not look like the expected behavior, as it is giving cpu 0 to 2 different tasks. 

I can reproduce this by setting CR_ONE_TASK_PER_CORE (does not happen if removed), so it is related to that parameter.

with SelectTypeParameters = CR_Core_Memory,CR_ONE_TASK_PER_CORE

oscar@saborito:~/Projects$ srun -n 2  -c 3  cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	0-1,4
Cpus_allowed_list:	0,2,6

with SelectTypeParameters = CR_Core_Memory

oscar@saborito:~/Projects$ srun -n 2  -c 3  cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	0-1,4
Cpus_allowed_list:	2,5-6

I will take a look into that, and come back to you when I have something.

Regards,
Oscar
Comment 6 Oscar Hernández 2022-06-16 04:08:25 MDT
Dear Regine,

Giving some feedback on this issue. The bug is related to the option --ntasks-per-core=1, the same one CR_ONE_TASK_PER_CORE implicitly sets. As it is documented in [1], srun does not recognize this option and in some particular situations, like this one, it can break core binding.

We are currently working on a patch to address the issue.

On the other hand, with regard to the binding necessities you had. Were you able to correctly bind tasks to threads by setting CpuBind=Thread?

Kind regards,
Oscar

[1]https://slurm.schedmd.com/srun.html#OPT_ntasks-per-core
Comment 7 Regine Gaudin 2022-06-27 03:35:07 MDT
Hi

Using hyperthreadings: suppress ONE_TASK_PER_CORE and add -CPUBind=thread in partition;conf is ok


 srun srun -n 2  -c 3 --exclusive -p rome cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	2,129-130
Cpus_allowed_list:	0-1,128
[gaudinr@inti6006 gaudinr] $ srun -n 2  -c 3  -p rome cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	2,129-130
Cpus_allowed_list:	0-1,128
[gaudinr@inti6006 gaudinr] $ srun -n 2    -p rome cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	128
Cpus_allowed_list:	0
[gaudinr@inti6006 gaudinr] $ srun -n 2 --hint=nomultithreda   -p rome cat /proc/self/status|grep Cpus_allowed_list
srun: error: unrecognized --hint argument "nomultithreda", see --hint=help
[gaudinr@inti6006 gaudinr] $ srun -n 2 --hint=nomultithread   -p rome cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	1
Cpus_allowed_list:	0
 -n 2  -c 3 --exclusive -p rome cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	2,129-130
Cpus_allowed_list:	0-1,128
[gaudinr@inti6006 gaudinr] $ srun -n 2  -c 3  -p rome cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	2,129-130
Cpus_allowed_list:	0-1,128
[gaudinr@inti6006 gaudinr] $ srun -n 2    -p rome cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	128
Cpus_allowed_list:	0

For the on who does not want to use hyperthread it seems ok
[gaudinr@inti6006 gaudinr] $ srun -n 2 --hint=nomultithreda   -p rome cat /proc/self/status|grep Cpus_allowed_list
srun: error: unrecognized --hint argument "nomultithreda", see --hint=help
[gaudinr@inti6006 gaudinr] $ srun -n 2 --hint=nomultithread   -p rome cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	1
Cpus_allowed_list:	0

 srun -n 2 -c 3  --hint=nomultithread   -p rome cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	67-69
Cpus_allowed_list:	64-66
[gaudinr@inti6006 gaudinr] $ srun -n 2 -c 3  --exclusive --hint=nomultithread   -p rome cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	3-5
Cpus_allowed_list:	0-2


Some users asked me this convenient request ?!!

Is there a a way (parameter, option ) on the hyperthreaded nodes allowing the following "back" full core request (-c 1 but 2 hyperthreads) 
 srun -n 2 -c 1 ( --hint=nomultithread )  -p rome cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	0,128
Cpus_allowed_list:	1,129
The aim is to have double hyperthread binding , one user task per core on one hyperthread  and the other hyperthead for the MPI helper thread for instance
Comment 8 Oscar Hernández 2022-06-27 10:21:10 MDT
Hi Regine,

I am not sure if I understood it correctly. Do you mean a parameter that can give a similar functionality to "--hint=nomultithread" when having thread binding configured?

If that is the case, adding --ntasks-per-core=1, should give the expected output.

#default behavior with CpuBind=Thread
oscar@comp:/TESTS$ srun -n 2 -c 1 whereami
   0 c1 - Cpus_allowed:	01	Cpus_allowed_list:	0
   1 c1 - Cpus_allowed:	10	Cpus_allowed_list:	4

#with ntasks option
oscar@comp:/TESTS$ srun -n 2 -c 1 --ntasks-per-core=1 whereami
   0 c1 - Cpus_allowed:	11	Cpus_allowed_list:	0,4
   1 c1 - Cpus_allowed:	22	Cpus_allowed_list:	1,5

Let me know if I misunderstood the question.
Comment 9 Regine Gaudin 2022-06-28 02:04:07 MDT
Hi

In fact users  would like to avoid the x2 -c option parameter as they are asking  physical core and we would like to avoid to change all the accounting processes.
(as -c is now thread for 1 full physical core you need to set -c 2 which is disruptive) 
 

--ntasks-per-core=1 seems to offer the possibility for -c 1 but
it does not do it for -c 2...

srun -n 1  -c 1 -p rome   cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	0
[gaudinr@inti6006 gaudinr] $  srun -n 1  -c 1 --ntasks-per-core=1 -p rome   cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	0,128

[gaudinr@inti6006 gaudinr] $  srun -n 1  -c 2 -p rome   cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	0,128
[gaudinr@inti6006 gaudinr] $  srun -n 1  -c 2 --ntasks-per-core=1 -p rome   cat /proc/self/status|grep Cpus_allowed_list
Cpus_allowed_list:	0,128

We would like Cpus_allowed_list:	0,1,128,129 for -c2
Comment 10 Oscar Hernández 2022-06-28 10:24:02 MDT
Hi Regine,

The idea of having hyperthreading and thread-binding is to treat each thread as an individual CPU, for task allocation, but also for accounting.

To be clear, -c 1 will grant you one thread, and also account you for 1 cpu. What you propose, automatically giving 2 cpus when -c 1 is requested would break the logic and wouldn't have much sense in a general perspective.

With your last comment though, I think I get what your initial intention was (I think I misunderstood initially), please let me know if I am wrong:

Your node has 128 cores and 128*2 threads. You want the node to allocate a maximum of 128 tasks/cores. You want also to account for a maximum of 128 cpus (not 256). A slurm behavior similar to a node with no hyperthreading, but at the same time, you want available cpus to show the threads:

oscar@comp:/TESTS$ srun -n 2 -c 1  whereami
   0 c1 - Cpus_allowed:	11	Cpus_allowed_list:	0,4
   1 c1 - Cpus_allowed:	22	Cpus_allowed_list:	1,5

oscar@comp:~/TESTS$ srun -n 1 -c 2  whereami
   0 c1 - Cpus_allowed:	33	Cpus_allowed_list:	0-1,4-5

If that is the case, you could try to modify the node definition in slurm.conf. Setting [1] CPUs=128 and remove the CpuBind suggested earlier. For example, in the initial confg you sent me:

NodeName=inti7800 Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=240000  Gres=gpu:nvidia:4 State=UNKNOWN CPUs=128

This will cause slurm to only allocate full core though, so that won't be possible:

oscar@comp:/TESTS$ srun -n 2 -c 1 whereami
   0 c1 - Cpus_allowed:	01	Cpus_allowed_list:	0
   1 c1 - Cpus_allowed:	10	Cpus_allowed_list:	4


Also, take into account that with this configuration, slurm will only account for 128 cpus per node. I suppose that is what you want to be consistent with the other non-hyoerthreading nodes.

[1]https://slurm.schedmd.com/slurm.conf.html#OPT_CPUs
Comment 11 Oscar Hernández 2022-07-04 05:07:12 MDT
Hi Regine,

Have the suggestions provided in the previous comment been useful to match your needs?

Regards,
Oscar
Comment 12 Regine Gaudin 2022-07-05 02:16:05 MDT

NodeName=inti[6006,6042] Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=240000  State=UNKNOWN  CPUs=128

Yes it seems it does answer to the request, thanks:
Users ask for core with -c as they are used to but can either use hyperthreads or not

[root@inti6006 slurm] # ccc_mprun  -n 4 -c 32   -p rome  cat /proc/self/status |grep Cpus_allowed_list
Cpus_allowed_list:      64-95,192-223
Cpus_allowed_list:      96-127,224-255
Cpus_allowed_list:      32-63,160-191
Cpus_allowed_list:      0-31,128-159
Comment 13 Oscar Hernández 2022-07-05 04:52:51 MDT
Great! Indeed, looks like we finally got the right configuration then.

I am closing this bug. Don't hesitate to re-open if any related question arises.

Regards,
Oscar