Ticket 6552

Summary:	Unable to disable hyper threading using SLURM_HINT
Product:	Slurm	Reporter:	Mohsin Ahmed <mohsin.shaikh>
Component:	Scheduling	Assignee:	Dominik Bartkiewicz <bart>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	2 - High Impact
Priority:	---	CC:	Andrew.Elwell, ashley.chew, darran.carey, david.schibeci, kevin.buckley, nate
Version:	18.08.5
Hardware:	Cray XC
OS:	Linux
See Also:	https://bugs.schedmd.com/show_bug.cgi?id=6682 https://bugs.schedmd.com/show_bug.cgi?id=6709
Site:	Pawsey	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---
Attachments:	Tarball with slurm.conf files and a test code. slurmd logs and cgroup.conf files from tds and production systems srun env dump on 24 numtasks. revised_logs run on same node with slurmd logs included. srun with slurm_debug set to level 6 nid00036.log slurmd and slurmstepd logs with very verbose debug levels.

Description Mohsin Ahmed 2019-02-20 22:59:17 MST

Created attachment 9236 [details]
Tarball with slurm.conf files and a test code.

Hi, 
On our production system we use SLURM version 17.11.9 and we are now testing version 18.08.5 on our Cray test and development system (TDS) so it can be rolled out to the the production environment. 
On the production system running 17.11.9, we were managing hyperthreading by the SLURM environment variable SLURM_HINT=nomultithread. This works fine with the attached configuration file slurm.conf.magnus. When tested with the a C code that queries the affinity of the CPU set, we get the following output on a compute node of our production system:
mshaikh@nid00011:/group/pawsey0001/mshaikh/Application_testsuite/pawseyapplicationtestsuite/resourcesdir/slurm/xthi> srun -n 24 ./xthi |sort
Rank 0, thread 0, on nid00011. core = 0.
Rank 1, thread 0, on nid00011. core = 12.
Rank 10, thread 0, on nid00011. core = 5.
Rank 11, thread 0, on nid00011. core = 17.
Rank 12, thread 0, on nid00011. core = 6.
Rank 13, thread 0, on nid00011. core = 18.
Rank 14, thread 0, on nid00011. core = 7.
Rank 15, thread 0, on nid00011. core = 19.
Rank 16, thread 0, on nid00011. core = 8.
Rank 17, thread 0, on nid00011. core = 20.
Rank 18, thread 0, on nid00011. core = 9.
Rank 19, thread 0, on nid00011. core = 21.
Rank 2, thread 0, on nid00011. core = 1.
Rank 20, thread 0, on nid00011. core = 10.
Rank 21, thread 0, on nid00011. core = 22.
Rank 22, thread 0, on nid00011. core = 11.
Rank 23, thread 0, on nid00011. core = 23.
Rank 3, thread 0, on nid00011. core = 13.
Rank 4, thread 0, on nid00011. core = 2.
Rank 5, thread 0, on nid00011. core = 14.
Rank 6, thread 0, on nid00011. core = 3.
Rank 7, thread 0, on nid00011. core = 15.
Rank 8, thread 0, on nid00011. core = 4.
Rank 9, thread 0, on nid00011. core = 16.

This is an expected behaviour where MPI tasks are distributed on sockets in Roundrobin fashion, as seen by the core IDs of each MPI task. 

When testing on the TDS running 18.08.5, we see a different behaviour where each physical core is running two logical CPUs hence each core is oversubscribed by two MPI tasks and on the same socket. The output from a compute node of test and development system is as follows:
mshaikh@chaos-int:/group/pawsey0001/mshaikh/Application_testsuite/pawseyapplicationtestsuite/resourcesdir/slurm/xthi> srun -n 24 ./xthi |sort
Rank 0, thread 0, on nid00036. core = 0.
Rank 1, thread 0, on nid00036. core = 6.
Rank 10, thread 0, on nid00036. core = 26.
Rank 11, thread 0, on nid00036. core = 32.
Rank 12, thread 0, on nid00036. core = 3.
Rank 13, thread 0, on nid00036. core = 9.
Rank 14, thread 0, on nid00036. core = 27.
Rank 15, thread 0, on nid00036. core = 33.
Rank 16, thread 0, on nid00036. core = 4.
Rank 17, thread 0, on nid00036. core = 10.
Rank 18, thread 0, on nid00036. core = 28.
Rank 19, thread 0, on nid00036. core = 34.
Rank 2, thread 0, on nid00036. core = 24.
Rank 20, thread 0, on nid00036. core = 5.
Rank 21, thread 0, on nid00036. core = 11.
Rank 22, thread 0, on nid00036. core = 29.
Rank 23, thread 0, on nid00036. core = 35.
Rank 3, thread 0, on nid00036. core = 30.
Rank 4, thread 0, on nid00036. core = 1.
Rank 5, thread 0, on nid00036. core = 7.
Rank 6, thread 0, on nid00036. core = 25.
Rank 7, thread 0, on nid00036. core = 31.
Rank 8, thread 0, on nid00036. core = 2.
Rank 9, thread 0, on nid00036. core = 8.

As it is evident from the above output from the TDS that the MPI tasks are running on hyperthreads. This can be confirmed by looking at the CPU IDs which are more then 23 whereas the total physical cores on the compute node are 24 (i.e 0-23). In this case SLURM_HINT did not have any effect and the same output is see when SLURM_HINT=nomultithread or multithread. 

I have added the slurm.conf for both systems for reference. 

A prompt guidance in this matter would be highly appreciated as we need to resolve this issue before we can decide on migrating to new SLURM version on our production system.

Comment 1 Alejandro Sanchez 2019-03-05 02:36:00 MST

Hi Moshin. Sorry for the delay on this, I'm currently on a 2-week on-site training and so have intermittent time to address bugs. I or another support engineer will be looking into this and come back to you as soon as possible. Thanks.

Comment 3 Dominik Bartkiewicz 2019-03-07 03:40:42 MST

Hi

Is there a reason for setting ThreadsPerCore=1 for nodes in slurm.conf.TDS?
Could you send me slurmd.log with enabling debug log level?

Dominik

Comment 6 Nate Rini 2019-03-07 14:49:34 MST

Mohsin

Can you please provide your cgroup.conf for production and TDS.

Can you run your test job on the TDS with slurmd in debug mode. This can be done by calling slurmd with "-vvvvv" or setting this in slurm.conf:
> SlurmdDebug=debug5

Slurmd will need to be restarted if slurm.conf is changed.
Please revert this setting after the test avoid filling log partitions (or syslog).

Can you run your test job with the following arguments:
> srun -vvvvv --mem-bind=verbose --cpu-bind=verbose $TESTJOB


Please dump the output of the 'env' or the job environment from your srun.

Please attach all the logs to this ticket, preferably as a compressed tarball.

Thanks,
--Nate

Comment 7 Mohsin Ahmed 2019-03-10 20:36:46 MDT

Created attachment 9500 [details]
slurmd logs and cgroup.conf files from tds and production systems

Hi Nate, 
I have run the tests you required. The logs are in the attached tarball slurmd_logs_bug6552.tar

Comment 8 Nate Rini 2019-03-10 20:53:30 MDT

(In reply to Mohsin Ahmed from comment #7)
> Created attachment 9500 [details]
> slurmd logs and cgroup.conf files from tds and production systems
> 
> Hi Nate, 
> I have run the tests you required. The logs are in the attached tarball
> slurmd_logs_bug6552.tar

We are reviewing your logs now.

Comment 9 Nate Rini 2019-03-10 20:57:21 MDT

(In reply to Nate Rini from comment #8)
> (In reply to Mohsin Ahmed from comment #7)
> > Created attachment 9500 [details]
> > slurmd logs and cgroup.conf files from tds and production systems
> > 
> > Hi Nate, 
> > I have run the tests you required. The logs are in the attached tarball
> > slurmd_logs_bug6552.tar
> 
> We are reviewing your logs now.

The logs appears to be provided for a single task job?
> SLURM_STEP_NUM_NODES=1
> SLURM_STEP_NUM_TASKS=1
> SLURM_STEP_TASKS_PER_NODE=1

Can we get a log for "-n 24" as in the original comment?

Comment 10 Mohsin Ahmed 2019-03-10 21:07:08 MDT

Created attachment 9501 [details]
srun env dump on 24 numtasks.

Hi Nate, 
Added the env dump with -n 24 from TDS. 
Regards
Mohsin

Comment 12 Nate Rini 2019-03-10 21:43:26 MDT

The slurmd_logs.tds also show a different job? The core assignment however looks like its getting rotated around.
> Rank 0 thread 0 on nid00036 core  0
> Rank 4 thread 0 on nid00036 core  1
> Rank 8 thread 0 on nid00036 core  2
> Rank 12 thread 0 on nid00036 core  3
> Rank 16 thread 0 on nid00036 core  4
> Rank 20 thread 0 on nid00036 core  5
> Rank 1 thread 0 on nid00036 core  6
> Rank 5 thread 0 on nid00036 core  7
> Rank 9 thread 0 on nid00036 core  8
> Rank 13 thread 0 on nid00036 core  9
> Rank 17 thread 0 on nid00036 core  10
> Rank 21 thread 0 on nid00036 core  11
> Rank 2 thread 0 on nid00036 core  24
> Rank 6 thread 0 on nid00036 core  25
> Rank 10 thread 0 on nid00036 core  26
> Rank 14 thread 0 on nid00036 core  27
> Rank 18 thread 0 on nid00036 core  28
> Rank 22 thread 0 on nid00036 core  29
> Rank 3 thread 0 on nid00036 core  30
> Rank 7 thread 0 on nid00036 core  31
> Rank 11 thread 0 on nid00036 core  32
> Rank 15 thread 0 on nid00036 core  33
> Rank 19 thread 0 on nid00036 core  34
> Rank 23 thread 0 on nid00036 core  35

Breaking up the SLURM_CPU_BIND_LIST variable, we see that the tasks are getting handed out to single threads (without repeats):
> 0x000000000001
> 0x000000000002
> 0x000000000004
> 0x000000000008
> 0x000000000010
> 0x000000000020
> 0x000000000040
> 0x000000000080
> 0x000000000100
> 0x000000000200
> 0x000000000400
> 0x000000000800
> 0x000001000000
> 0x000002000000
> 0x000004000000
> 0x000008000000
> 0x000010000000
> 0x000020000000
> 0x000040000000
> 0x000080000000
> 0x000100000000
> 0x000200000000
> 0x000400000000
> 0x000800000000
Looks like it ran on nid00036 which has MaxCPUsPerNode=48 from your slurm.conf which explains why the cpus are not all in order. Can you please run this job again and then call this using srun:
> cat /proc/self/status
> lstopo --of console --taskset
> numactl -s
> lscpu

The slurm.conf has this setting for Magnus:
> SelectTypeParameters=CR_ONE_TASK_PER_CORE,CR_CORE_Memory,other_cons_res
But Chaos slurm.conf has this setting:
> SelectTypeParameters=CR_CORE_Memory,other_cons_res
Slurm should only configured to schedule by cores with CR_Core_Memory or CR_ONE_TASK_PER_CORE depending if you want users to be able to choose to schedule by threads. It is interesting that this broke on Chaos with 18.08.

I also note the TDS slurmd log (which looks like the srun log) has this:
>srun: threads-per-core  : 1
Was this set manually or was CR_ONE_TASK_PER_CORE set when this job was run?

Comment 13 Dominik Bartkiewicz 2019-03-11 02:25:00 MDT

Hi

I can't find slurmd log, only logs from srun.
Have you set intentional ThreadsPerCore=1 for nodes in slurm.conf.TDS?

Dominik

Comment 14 Kevin Buckley 2019-03-11 02:29:21 MDT

So when you say,

> Looks like it ran on nid00036 which has MaxCPUsPerNode=48 from your
> slurm.conf which explains why the cpus are not all in order. 

should we take it as saying that because the "acceptance" queue
doesn't explictly specify a MaxCPUsPerNode value which the workq
and gpuq do, (MaxCPUsPerNode=24, MaxCPUsPerNode=8 respectively)
that the "acceptance" queue will get whatever "default" that 
SLURM gets back from interogating the OS ?

As in, the job IS NOT getting MaxCPUsPerNode=48 from our slurm.conf ?

Kevin M. Buckley
-- 
Supercomputing Systems Administrator
Pawsey Supercomputing Centre

Comment 15 Kevin Buckley 2019-03-11 03:12:09 MDT

> Have you set intentional ThreadsPerCore=1 for nodes in slurm.conf.TDS?

Yes.

Here's what happens.

When we do the automated slurm.conf generation, as part of the install,
we get the following on the TDS, where ThreadsPerCore=2 because Cray
seem unable to turn off hyperthreading in their BIOS.

#NodeName=nid000[32-35] Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 Gres=craynetwork:4 # RealMemory=65536
#NodeName=nid000[36-39] Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 Gres=craynetwork:4 # RealMemory=65536
#NodeName=nid000[13-15] Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 Gres=craynetwork:4 # RealMemory=65536
#NodeName=nid000[16-19] Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 Gres=craynetwork:4 # RealMemory=65536
#NodeName=nid000[24-27] Sockets=1 CoresPerSocket=8 ThreadsPerCore=2 Gres=craynetwork:4,gpu # RealMemory=32768

As you will see from the TDS slurm.conf, we comment those definitions out, and
even though no-one here believes it ever did anything, we explictly put in 

ThreadsPerCore=1

as a default in the line

NodeName=DEFAULT Sockets=2 ThreadsPerCore=1 Gres=craynetwork:4

before going on to define other node-specific values, vis:

NodeName=nid000[32-35] CoresPerSocket=10 Feature=ivybridge             RealMemory=64394
NodeName=nid000[36-39] CoresPerSocket=12 Feature=haswell               RealMemory=64298
NodeName=nid000[13-15] CoresPerSocket=10 Feature=ivybridge             RealMemory=64394
NodeName=nid000[16-19] CoresPerSocket=12 Feature=haswell               RealMemory=64298
NodeName=nid000[24-27] CoresPerSocket=8  Feature=sandybridge,gpu,tesla RealMemory=32154 Sockets=1 Gres=craynetwork:4,gpu

that DON'T, however, override the Deafult.

Hope the provenance of that setting is useful.

Kevin M. Buckley
-- 
Supercomputing Systems Administrator
Pawsey Supercomputing Centre

Comment 16 Mohsin Ahmed 2019-03-11 03:48:11 MDT

Created attachment 9502 [details]
revised_logs run on same node with slurmd logs included.

Hello Dominik, 
I have added a revised logs tarball containing the srun outputs, Slurmd logs, the node environment and the cgroup.conf from both the TDS and the Production system. 

The slurmd logs are trimmed to capture the logs for the jobID of interest. 
Regards
Mohsin

Comment 17 Dominik Bartkiewicz 2019-03-11 04:33:28 MDT

Hi

Could you generate slurmd.log with higher debug level, see Nate comment 6.

NodeName=DEFAULT works,
Could you set in slurm conf configuration that is matching node hardware?
And then make another test. I think the proper configuration should work fine.

Dominik

Comment 19 Mohsin Ahmed 2019-03-11 07:02:16 MDT

Created attachment 9503 [details]
srun with slurm_debug set to level 6

Hi Dominik, 
Apologies for not adding the required slurm_debug level. 
I have repeated the run with the following command line this time (instead of making changes to slurm.conf and restart daemon):

srun -n 24 --export=all --slurmd-debug=-vvvvv  --mem-bind=verbose --cpu-bind=verbose ./xthi | sort -k 2 -n

regards
Mohsin

Comment 20 Dominik Bartkiewicz 2019-03-12 03:03:01 MDT

Hi

Sorry for bugging you about slurmd.log, but this time you have only attached slurmstepd log, could you send me both slurmd and slurmstepd log?
Did you have a chance to test config with ThreadsPerCore=2?

Dominik

Comment 21 Kevin Buckley 2019-03-12 04:08:29 MDT

Created attachment 9531 [details]
nid00036.log

On 2019/03/12 17:03, bugs@schedmd.com wrote:

> Sorry for bugging you about slurmd.log, but this time you have only attached
> slurmstepd log, could you send me both slurmd and slurmstepd log?

Mohsin is away today (Tue 12th), hence the lack of updates from
him on this.

I have just seen your latest comment as I was about to leave,
so, to try and progress this ...

... find attached the slurmd log from the job for which Mohsin
has already supplied the slurmstepd log. Note that this is the
full log, not just the section from Mohsin's last run, as I
don't have time to cut it down.

> Did you have a chance to test config with ThreadsPerCore=2?

Just of interest though, what's the thinking there?

Surely, if that parameter actually does anything at all, then
having it set to 1, as we have on the TDS, should limit the
Threads Per Core to 1 ?

The suggestion that we set ThreadsPerCore=2, so that we can
get SLURM to only use 1 Thread Per Core seems a bit obtuse,
but naybe it is supposed to be?

Kevin

Comment 22 Dominik Bartkiewicz 2019-03-12 05:13:26 MDT

(In reply to Kevin Buckley from comment #21)
> Created attachment 9531 [details]
> nid00036.log
> 
> On 2019/03/12 17:03, bugs@schedmd.com wrote:
> 
> > Sorry for bugging you about slurmd.log, but this time you have only attached
> > slurmstepd log, could you send me both slurmd and slurmstepd log?
> 
> Mohsin is away today (Tue 12th), hence the lack of updates from
> him on this.
> 
> I have just seen your latest comment as I was about to leave,
> so, to try and progress this ...
> 
> ... find attached the slurmd log from the job for which Mohsin
> has already supplied the slurmstepd log. Note that this is the
> full log, not just the section from Mohsin's last run, as I
> don't have time to cut it down.
> 
Hi

This is still not what we asked for, we need slurmd log with enabled debug.
To check how slurmd translates abstract core from slurmctld to physical CPUs o node we need at least debug2.
debug4 should show us a full mapping between abstract and physical CPUs
Check: comment 6, comment 17

> 
> > Did you have a chance to test config with ThreadsPerCore=2?
> 
> Just of interest though, what's the thinking there?
> 
> Surely, if that parameter actually does anything at all, then
> having it set to 1, as we have on the TDS, should limit the
> Threads Per Core to 1 ?
> 
> The suggestion that we set ThreadsPerCore=2, so that we can
> get SLURM to only use 1 Thread Per Core seems a bit obtuse,
> but naybe it is supposed to be?
> 
> Kevin

This is just nodes definition and it should match to hardware on nodes.
otherwise, slurm has a problem with right assigning cpus to job.

Slurm provides mechanisms for allocating job with only one task per core
CR_ONE_TASK_PER_CORE, or task/affinity plugin option --hint/SLURM_HINT.

Dominik

Comment 23 Kevin Buckley 2019-03-12 19:23:26 MDT

On 2019/03/12 19:13, bugs@schedmd.com wrote:

> This is still not what we asked for, we need slurmd log with enabled debug.
> To check how slurmd translates abstract core from slurmctld to physical CPUs o
> node we need at least debug2.
> debug4 should show us a full mapping between abstract and physical CPUs
> Check: comment 6, comment 17

So, just to be clear, Mohsin's invocation of the job

>>> srun -n 24 --export=all --slurmd-debug=-vvvvv  --mem-bind=verbose --cpu-bind=verbose ./xthi | sort -k 2 -n

hasn't altered the debug level: you need the slurmd started with those
settings in the SLURM config ?

>> The suggestion that we set ThreadsPerCore=2, so that we can
>> get SLURM to only use 1 Thread Per Core seems a bit obtuse,
>> but naybe it is supposed to be?
> 
> This is just nodes definition and it should match to hardware on nodes.
> otherwise, slurm has a problem with right assigning cpus to job.
> 
> Slurm provides mechanisms for allocating job with only one task per core
> CR_ONE_TASK_PER_CORE, or task/affinity plugin option --hint/SLURM_HINT.

I'm still none the wiser there but, you're the expert, so I guess
we can set it up as you suggest.

Comment 24 Nate Rini 2019-03-12 22:01:49 MDT

(In reply to Kevin Buckley from comment #23)
> So, just to be clear, Mohsin's invocation of the job
> 
> >>> srun -n 24 --export=all --slurmd-debug=-vvvvv  --mem-bind=verbose --cpu-bind=verbose ./xthi | sort -k 2 -n
> 
> hasn't altered the debug level: you need the slurmd started with those
> settings in the SLURM config ?
Yes, this argument only takes effect for job steps. To get the logs we need, we either need slurmd to be started with "-vvvvv" or for "SlurmdDebug=debug5" to be set in the slurm.conf and SIGHUP sent to the slurmd daemon. The slurm.conf change can be limited to the single node and then should be reversed once the logs have been retrieved.

Comment 25 Nate Rini 2019-03-12 22:10:19 MDT

(In reply to Kevin Buckley from comment #23)
> >> The suggestion that we set ThreadsPerCore=2, so that we can
> >> get SLURM to only use 1 Thread Per Core seems a bit obtuse,
> >> but naybe it is supposed to be?
> > 
> > This is just nodes definition and it should match to hardware on nodes.
> > otherwise, slurm has a problem with right assigning cpus to job.
> > 
> > Slurm provides mechanisms for allocating job with only one task per core
> > CR_ONE_TASK_PER_CORE, or task/affinity plugin option --hint/SLURM_HINT.
> 
> I'm still none the wiser there but, you're the expert, so I guess
> we can set it up as you suggest.

Slurm can be configured to have core count in CPUs field with CR_Core_Memory or to use thread count in CPUs field with CR_ONE_TASK_PER_CORE. CR_ONE_TASK_PER_CORE allows users to choose if they want to schedule by threads if request tasks-per-cpu but accounting will be done by threads. Either option should work, please decide which one best fits your site's needs.

The ThreadsPerCore field should always reflect the value returned by calling "slurmd -C".

Comment 26 Kevin Buckley 2019-03-12 22:46:06 MDT

On 2019/03/13 12:01, bugs@schedmd.com wrote:

> Yes, this argument only takes effect for job steps. To get the logs we need, we
> either need slurmd to be started with "-vvvvv" or for "SlurmdDebug=debug5" to
> be set in the slurm.conf and SIGHUP sent to the slurmd daemon. The slurm.conf
> change can be limited to the single node and then should be reversed once the
> logs have been retrieved.

Understood

We are now running with debug5 and ThreadsPerCore=2

49c49
< SlurmdDebug=info
---
> SlurmdDebug=debug5
51c51
< SlurmdSyslogDebug=info
---
> SlurmdSyslogDebug=debug5
134c134
< NodeName=DEFAULT Sockets=2 ThreadsPerCore=1 Gres=craynetwork:4
---
> NodeName=DEFAULT Sockets=2 ThreadsPerCore=2 Gres=craynetwork:4


Mohsin should be running his job again shortly (now 1245 AWST)

Kevin

Comment 27 Mohsin Ahmed 2019-03-13 00:00:44 MDT

Created attachment 9554 [details]
slurmd and slurmstepd logs with very verbose debug levels.

Hi Dominik, 
Please see attached, which to my understanding, should be aligned with what was instructed in Comment#6 by Nate. 
Regards
Mohsin

Comment 28 Dominik Bartkiewicz 2019-03-13 15:51:13 MDT

Hi

Thank you for slurmd log. 
Could tell me what specification has job present in this log?
Does SLURM_HINT env work correctly with new configuration? 
Have you a chance to test CR_ONE_TASK_PER_CORE.

Dominik

Comment 29 Kevin Buckley 2019-03-14 04:22:27 MDT

> Have you a chance to test CR_ONE_TASK_PER_CORE.

We have not, as yet, looked into what, if any, differences we
might get in operation, were we to alter the TDS config to have:

  SelectTypeParameters=CR_ONE_TASK_PER_CORE

instead of what it currently has, vis:

  SelectTypeParameters=CR_CORE_Memory

as this is not considered relevant to the Hyperthreading
issue that we we reported seeing in this issue and which
SchedMD have now solved for us.


To recap:

We were using our TDS to investiagte the effects of upgrading
SLURM from 17 to 18, as there is a desire to do so on our
production systems.

The production system (Magnus), has a SLURM config that does not
give rise to Hyperthreading when running under SLURM 17.

The TDS had a SLURM config that did not give rise to Hyperthreading
when we were running it under SLURM 17.

When we upgraded the TDS SLURM to 18, without changing its config,
we saw Hyperthreading.


SchedMD's inspection of our configs pointed out that:

  the TDS config had had the SLURM-determined ThreadsPerCore value
  overridden in an attempt to control Hyperthreading - Magnus's
  config had not - although neither value gave rise to Hyperthreading
  when both were running under SLURM 17.


SchedMD's suggestions have shown us that if we removed the change
we made to the SLURM-determined ThreadsPerCore value, we no longer
see any Hyperthreading with the TDS running SLURM 18. This has been
confirmed, both in our test program output and by inspection of an
increased level of run-time diagnostics.

The Hyperthreading issue is, therefore, closed to our satisfaction.

Thanks for the time spent on this, and for bearing with us as
we got our heads around the effects of tweaking SLURM-determined
config parameters, for which the name implies they are giving the
user something to tweak, when, in fact, tweaking them may cause
some problems for the operation of SLURM.


As regards the concern expressed by SchedMD in comment 12, as
to our production system's config specifying two values that
SchedMD believe to be orthogonal, that is a matter that we
intend to look into and for which we will begin a new issue
with SchedMD, once we have started that investigation.

Comment 30 Dominik Bartkiewicz 2019-03-14 11:27:10 MDT

Hi

Thanks for the info.
I’m going to close this ticket now. If I misunderstand you please reopen.

Dominik