Ticket 5727

Summary:	hyper threading
Product:	Slurm	Reporter:	surendra <surendra.sunkari>
Component:	Configuration	Assignee:	Marshall Garey <marshall>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	4 - Minor Issue
Priority:	---	CC:	tim
Version:	17.11.1
Hardware:	Linux
OS:	Linux
Site:	NREL	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description surendra 2018-09-12 16:39:41 MDT

We have some users that want hyperthreads on and some users that want hyperthreads off.  We added a prologue that allows the user to request that their job run with or without hyperthreads by setting a comment in the batch file.  It appears, though, that slurm checks the number of cores at some point and downs nodes with the wrong number of cores.

    Is there a way to configure slurm to handle this situation?  We basically need to have slurm allow us to all of the cores or half of the cores.

Comment 1 Marshall Garey 2018-09-12 16:47:43 MDT

I think you're interested in the --hint=[no]multithread option for srun and sbatch. Leave the nodes with hyperthreading enabled, and users can control how their tasks are bound themselves (or you as an admin can use a job submit plugin to do it for them):

A quick example with this node config:

NodeName=DEFAULT RealMemory=3000 Sockets=1 CoresPerSocket=4 ThreadsPerCore=2


$ srun -n8 whereami
0001 v1 - Cpus_allowed: 11      Cpus_allowed_list:      0,4
0006 v1 - Cpus_allowed: 88      Cpus_allowed_list:      3,7
0000 v1 - Cpus_allowed: 11      Cpus_allowed_list:      0,4
0005 v1 - Cpus_allowed: 44      Cpus_allowed_list:      2,6
0007 v1 - Cpus_allowed: 88      Cpus_allowed_list:      3,7
0003 v1 - Cpus_allowed: 22      Cpus_allowed_list:      1,5
0004 v1 - Cpus_allowed: 44      Cpus_allowed_list:      2,6
0002 v1 - Cpus_allowed: 22      Cpus_allowed_list:      1,5

$ srun -n8 --hint=multithread whereami
0007 v1 - Cpus_allowed: 80      Cpus_allowed_list:      7
0001 v1 - Cpus_allowed: 10      Cpus_allowed_list:      4
0005 v1 - Cpus_allowed: 40      Cpus_allowed_list:      6
0003 v1 - Cpus_allowed: 20      Cpus_allowed_list:      5
0004 v1 - Cpus_allowed: 04      Cpus_allowed_list:      2
0006 v1 - Cpus_allowed: 08      Cpus_allowed_list:      3
0000 v1 - Cpus_allowed: 01      Cpus_allowed_list:      0
0002 v1 - Cpus_allowed: 02      Cpus_allowed_list:      1

$ srun -n8 --hint=nomultithread whereami
0005 v2 - Cpus_allowed: 10      Cpus_allowed_list:      4
0006 v2 - Cpus_allowed: 02      Cpus_allowed_list:      1
0007 v2 - Cpus_allowed: 20      Cpus_allowed_list:      5
0000 v1 - Cpus_allowed: 01      Cpus_allowed_list:      0
0004 v2 - Cpus_allowed: 01      Cpus_allowed_list:      0
0002 v1 - Cpus_allowed: 02      Cpus_allowed_list:      1
0001 v1 - Cpus_allowed: 10      Cpus_allowed_list:      4
0003 v1 - Cpus_allowed: 20      Cpus_allowed_list:      5

Is this an acceptable solution?

Comment 3 Marshall Garey 2018-09-12 16:55:19 MDT

A note on the output from comment 1: notice that without --hint=[no]multithread, the tasks are bound to the whole core, but there could be as many tasks on a core as there are hyperthreads on a core. With --hint=multithread, each task is bound to a single hyperthread. With --hint=nomultithread, each task is bound to a single core, and only a single task per core is permitted.


Also, I'm dropping the severity to sev-4. I was told that your site joined support a little early (before training in October) but that all your bugs should be sev-4 for now. This is just a friendly reminder to keep things at sev-4 for now.

I haven't read the contract or email, so I don't know exactly when you'll be able to submit severity 1-3 bugs - just keep in contact with whoever is managing the contract with you (I assume Jacob or Jess).

Comment 6 Marshall Garey 2018-09-17 08:59:11 MDT

I'm closing this as resolved/infogiven.

I found out that your "official" support contract period starts on Oct 1 (when you can submit any severity ticket), and we're supporting sev-4 tickets until then.

Comment 7 Marshall Garey 2018-09-17 09:01:02 MDT

(Forgot to close the ticket - actually closing)

Feel free to reopen if you have further questions regarding hyperthreading.