Ticket 5727

Summary: hyper threading
Product: Slurm Reporter: surendra <surendra.sunkari>
Component: ConfigurationAssignee: Marshall Garey <marshall>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: tim
Version: 17.11.1   
Hardware: Linux   
OS: Linux   
Site: NREL Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description surendra 2018-09-12 16:39:41 MDT
We have some users that want hyperthreads on and some users that want hyperthreads off.  We added a prologue that allows the user to request that their job run with or without hyperthreads by setting a comment in the batch file.  It appears, though, that slurm checks the number of cores at some point and downs nodes with the wrong number of cores.

    Is there a way to configure slurm to handle this situation?  We basically need to have slurm allow us to all of the cores or half of the cores.
Comment 1 Marshall Garey 2018-09-12 16:47:43 MDT
I think you're interested in the --hint=[no]multithread option for srun and sbatch. Leave the nodes with hyperthreading enabled, and users can control how their tasks are bound themselves (or you as an admin can use a job submit plugin to do it for them):

A quick example with this node config:

NodeName=DEFAULT RealMemory=3000 Sockets=1 CoresPerSocket=4 ThreadsPerCore=2


$ srun -n8 whereami
0001 v1 - Cpus_allowed: 11      Cpus_allowed_list:      0,4
0006 v1 - Cpus_allowed: 88      Cpus_allowed_list:      3,7
0000 v1 - Cpus_allowed: 11      Cpus_allowed_list:      0,4
0005 v1 - Cpus_allowed: 44      Cpus_allowed_list:      2,6
0007 v1 - Cpus_allowed: 88      Cpus_allowed_list:      3,7
0003 v1 - Cpus_allowed: 22      Cpus_allowed_list:      1,5
0004 v1 - Cpus_allowed: 44      Cpus_allowed_list:      2,6
0002 v1 - Cpus_allowed: 22      Cpus_allowed_list:      1,5

$ srun -n8 --hint=multithread whereami
0007 v1 - Cpus_allowed: 80      Cpus_allowed_list:      7
0001 v1 - Cpus_allowed: 10      Cpus_allowed_list:      4
0005 v1 - Cpus_allowed: 40      Cpus_allowed_list:      6
0003 v1 - Cpus_allowed: 20      Cpus_allowed_list:      5
0004 v1 - Cpus_allowed: 04      Cpus_allowed_list:      2
0006 v1 - Cpus_allowed: 08      Cpus_allowed_list:      3
0000 v1 - Cpus_allowed: 01      Cpus_allowed_list:      0
0002 v1 - Cpus_allowed: 02      Cpus_allowed_list:      1

$ srun -n8 --hint=nomultithread whereami
0005 v2 - Cpus_allowed: 10      Cpus_allowed_list:      4
0006 v2 - Cpus_allowed: 02      Cpus_allowed_list:      1
0007 v2 - Cpus_allowed: 20      Cpus_allowed_list:      5
0000 v1 - Cpus_allowed: 01      Cpus_allowed_list:      0
0004 v2 - Cpus_allowed: 01      Cpus_allowed_list:      0
0002 v1 - Cpus_allowed: 02      Cpus_allowed_list:      1
0001 v1 - Cpus_allowed: 10      Cpus_allowed_list:      4
0003 v1 - Cpus_allowed: 20      Cpus_allowed_list:      5

Is this an acceptable solution?
Comment 3 Marshall Garey 2018-09-12 16:55:19 MDT
A note on the output from comment 1: notice that without --hint=[no]multithread, the tasks are bound to the whole core, but there could be as many tasks on a core as there are hyperthreads on a core. With --hint=multithread, each task is bound to a single hyperthread. With --hint=nomultithread, each task is bound to a single core, and only a single task per core is permitted.


Also, I'm dropping the severity to sev-4. I was told that your site joined support a little early (before training in October) but that all your bugs should be sev-4 for now. This is just a friendly reminder to keep things at sev-4 for now.

I haven't read the contract or email, so I don't know exactly when you'll be able to submit severity 1-3 bugs - just keep in contact with whoever is managing the contract with you (I assume Jacob or Jess).
Comment 6 Marshall Garey 2018-09-17 08:59:11 MDT
I'm closing this as resolved/infogiven.

I found out that your "official" support contract period starts on Oct 1 (when you can submit any severity ticket), and we're supporting sev-4 tickets until then.
Comment 7 Marshall Garey 2018-09-17 09:01:02 MDT
(Forgot to close the ticket - actually closing)

Feel free to reopen if you have further questions regarding hyperthreading.