Ticket 7033

Summary:	Hyperthreading not working
Product:	Slurm	Reporter:	Doug Meyer <dameyer>
Component:	Scheduling	Assignee:	Felip Moll <felip.moll>
Status:	RESOLVED DUPLICATE	QA Contact:
Severity:	4 - Minor Issue
Priority:	---	CC:	felip.moll
Version:	18.08.5
Hardware:	Linux
OS:	Linux
Site:	Raytheon Missile, Space and Airborne	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	RHEL	Machine Name:	slurm02
CLE Version:		Version Fixed:	7.6
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---
Attachments:	slurm.conf sample script test_results

Description Doug Meyer 2019-05-16 07:02:14 MDT

After switching from simple node descriptor, CPUS=56, memory=256000, to the inventory described by slurmd-C, the hyperthreaded job slots are no longer used.  At most a node gets 28 active threads.  Is there a slurm configuration we are missing for HT or do we need to switch back to the simple descriptor? 

Thank you

Comment 1 Felip Moll 2019-05-16 09:08:38 MDT

Hi Doug,

If you can attach your latest slurm.conf I will check your configs. I guess you have CR_ONE_TASK_PER_CORE setting, and I am interested in how did you exactly define the nodes.

CR_ONE_TASK_PER_CORE
        Allocate one task per core by default.  Without this option, by default one task will be allocated per thread on nodes with more than one ThreadsPerCore configured.
        NOTE:  This option cannot be used with CR_CPU*.

Comment 2 Doug Meyer 2019-05-16 10:44:22 MDT

Created attachment 10247 [details]
slurm.conf

hpc3 is the new config.  It is not allocating logical threads for single thread jobs.  Thank you for the fast response.

Comment 3 Felip Moll 2019-05-17 11:56:43 MDT

Hi Doug,

Can you be more specific on how you are testing it?

I get all 56 slots with a similar configuration:

[slurm@moll0 18.08]$ srun bash -c 'slurmd -C'
NodeName=moll1 CPUs=56 Boards=1 SocketsPerBoard=2 CoresPerSocket=14 ThreadsPerCore=2 RealMemory=984
UpTime=0-00:26:42

[slurm@moll0 18.08]$ srun --mem 10 -n 56 bash -c "taskset -cp \$\$"|cut -d":" -f 2|sort -n
0
1
2
3
4
5
...
55

[slurm@moll0 18.08]$ scontrol show config|grep  "TaskPlugin\|Select"
SelectType              = select/cons_res
SelectTypeParameters    = CR_CPU_MEMORY
TaskPlugin              = task/affinity
TaskPluginParam         = (null type)


Can you try to do the same 'srun' test than me?

Also I recommend to not set boards and set instead:

NodeName=hpc[1089-1092] CPUs=56 Sockets=2 CoresPerSocket=14 ThreadsPerCore=2 RealMemory=256000


Thanks

Comment 4 Doug Meyer 2019-05-17 12:17:54 MDT

Created attachment 10276 [details]
sample script

Comment 5 Doug Meyer 2019-05-17 12:38:28 MDT

Changed the node description.  No change.

Results of
scontrol show config|grep  "TaskPlugin\|Select"

SelectType              = select/cons_res
SelectTypeParameters    = CR_CPU_MEMORY
TaskPlugin              = task/affinity
TaskPluginParam         = (null type)

Command shared failed for missing variable declaration.  Believe you wanted to see all the HT threads.  Ran srun "mpstat -P ALL 1" instead and show threads 0 55.

sample submit script attached.  When launched via sbatch against a 28-core/56-thread node, array tasks are only assigned to the physical cores.  HT threads remain unused.

Comment 6 Felip Moll 2019-05-17 12:46:28 MDT

(In reply to Doug Meyer from comment #5)
> Changed the node description.  No change.
> 
> Results of
> scontrol show config|grep  "TaskPlugin\|Select"
> 
> SelectType              = select/cons_res
> SelectTypeParameters    = CR_CPU_MEMORY
> TaskPlugin              = task/affinity
> TaskPluginParam         = (null type)
> 
> Command shared failed for missing variable declaration.  Believe you wanted
> to see all the HT threads.  Ran srun "mpstat -P ALL 1" instead and show
> threads 0 55.
> 
> sample submit script attached.  When launched via sbatch against a
> 28-core/56-thread node, array tasks are only assigned to the physical cores.
> HT threads remain unused.

Is it possible to enable the cgroup plugin in your environment? There are software that can surpass the affinity setting if not enforced by cgroup.

If you are able, please set:

TaskPlugin=task/cgroup,task/affinity

and create a cgroup.conf file in the same directory than slurm.conf with this content:

###
#
# Slurm cgroup support configuration file
#
# See man slurm.conf and man cgroup.conf for further
# information on cgroup configuration parameters

#######################
### General options ###
#######################
CgroupAutomount=yes

########################################
#### TaskPlugin=task/cgroup options ####
########################################
# Force cores limit, needs hwloc libraries
ConstrainCores=yes

# Bind each step task to a subset of allocated cores using
# sched_setaffinity. Needs hwloc libraries. (disabled since task/affinity set)
TaskAffinity=no


You will need to restart daemons.


Offtopic question: I see you are not enforcing memory so you can get OOMs. Is this intended?

Comment 7 Doug Meyer 2019-05-17 13:48:17 MDT

Hi,
SelectType              = select/cons_res
SelectTypeParameters    = CR_CPU_MEMORY
TaskPlugin              = task/cgroup,task/affinity
TaskPluginParam         = (null type)

placed the cgroup.conf in use.

No change though.  still 28 tasks running at a time from the array.

Many of our jobs have spikes in memory use that forced slurm to kill them.  Turning off memory enforcement does expose us to OOM killer occasionally (very rare)

Comment 8 Felip Moll 2019-05-20 02:25:45 MDT

(In reply to Doug Meyer from comment #7)
> Hi,
> SelectType              = select/cons_res
> SelectTypeParameters    = CR_CPU_MEMORY
> TaskPlugin              = task/cgroup,task/affinity
> TaskPluginParam         = (null type)
> 
> placed the cgroup.conf in use.
> 
> No change though.  still 28 tasks running at a time from the array.
> 

I would need your slurmctld log at the time of submitting such a job, but first enabling debug:

scontrol setdebug debug2
scontrol setdebugflags +CPU_Bind

grab the logs and reset debug to your previous values.

Then I'd need the outputs of the commands requested in your other bug 7029, i.e.:

srun  --mem 10 --ntasks-per-core=1  --ntasks=28 bash -c "taskset -cp \$\$"|cut -d":" -f 2|sort -n


> Many of our jobs have spikes in memory use that forced slurm to kill them. 
> Turning off memory enforcement does expose us to OOM killer occasionally
> (very rare)

So you are assuming you may have OOM. If this is the case, it is ok but be aware that this can affect slurmd and other system components too.



Thank you

Comment 9 Doug Meyer 2019-05-20 12:37:40 MDT

Created attachment 10292 [details]
test_results

Suspect I hosed the test.  I undid croups last week as it was not a production change.  This test is without cgroups enabled.

Could not run command from command line but was able to put into a script and srun that.

Comment 10 Felip Moll 2019-05-27 05:09:25 MDT

Doug,

Is it ok to mark this bug as a duplicate of your other bug 7029?

Thanks

Comment 11 Doug Meyer 2019-05-27 19:01:22 MDT

That will be fine.  For some reason I thought I was asked to open a separate ticket...

Comment 12 Felip Moll 2019-05-28 01:52:44 MDT

Marking as dup.

*** This ticket has been marked as a duplicate of ticket 7029 ***