We have some users that want hyperthreads on and some users that want hyperthreads off. We added a prologue that allows the user to request that their job run with or without hyperthreads by setting a comment in the batch file. It appears, though, that slurm checks the number of cores at some point and downs nodes with the wrong number of cores. Is there a way to configure slurm to handle this situation? We basically need to have slurm allow us to all of the cores or half of the cores.
I think you're interested in the --hint=[no]multithread option for srun and sbatch. Leave the nodes with hyperthreading enabled, and users can control how their tasks are bound themselves (or you as an admin can use a job submit plugin to do it for them): A quick example with this node config: NodeName=DEFAULT RealMemory=3000 Sockets=1 CoresPerSocket=4 ThreadsPerCore=2 $ srun -n8 whereami 0001 v1 - Cpus_allowed: 11 Cpus_allowed_list: 0,4 0006 v1 - Cpus_allowed: 88 Cpus_allowed_list: 3,7 0000 v1 - Cpus_allowed: 11 Cpus_allowed_list: 0,4 0005 v1 - Cpus_allowed: 44 Cpus_allowed_list: 2,6 0007 v1 - Cpus_allowed: 88 Cpus_allowed_list: 3,7 0003 v1 - Cpus_allowed: 22 Cpus_allowed_list: 1,5 0004 v1 - Cpus_allowed: 44 Cpus_allowed_list: 2,6 0002 v1 - Cpus_allowed: 22 Cpus_allowed_list: 1,5 $ srun -n8 --hint=multithread whereami 0007 v1 - Cpus_allowed: 80 Cpus_allowed_list: 7 0001 v1 - Cpus_allowed: 10 Cpus_allowed_list: 4 0005 v1 - Cpus_allowed: 40 Cpus_allowed_list: 6 0003 v1 - Cpus_allowed: 20 Cpus_allowed_list: 5 0004 v1 - Cpus_allowed: 04 Cpus_allowed_list: 2 0006 v1 - Cpus_allowed: 08 Cpus_allowed_list: 3 0000 v1 - Cpus_allowed: 01 Cpus_allowed_list: 0 0002 v1 - Cpus_allowed: 02 Cpus_allowed_list: 1 $ srun -n8 --hint=nomultithread whereami 0005 v2 - Cpus_allowed: 10 Cpus_allowed_list: 4 0006 v2 - Cpus_allowed: 02 Cpus_allowed_list: 1 0007 v2 - Cpus_allowed: 20 Cpus_allowed_list: 5 0000 v1 - Cpus_allowed: 01 Cpus_allowed_list: 0 0004 v2 - Cpus_allowed: 01 Cpus_allowed_list: 0 0002 v1 - Cpus_allowed: 02 Cpus_allowed_list: 1 0001 v1 - Cpus_allowed: 10 Cpus_allowed_list: 4 0003 v1 - Cpus_allowed: 20 Cpus_allowed_list: 5 Is this an acceptable solution?
A note on the output from comment 1: notice that without --hint=[no]multithread, the tasks are bound to the whole core, but there could be as many tasks on a core as there are hyperthreads on a core. With --hint=multithread, each task is bound to a single hyperthread. With --hint=nomultithread, each task is bound to a single core, and only a single task per core is permitted. Also, I'm dropping the severity to sev-4. I was told that your site joined support a little early (before training in October) but that all your bugs should be sev-4 for now. This is just a friendly reminder to keep things at sev-4 for now. I haven't read the contract or email, so I don't know exactly when you'll be able to submit severity 1-3 bugs - just keep in contact with whoever is managing the contract with you (I assume Jacob or Jess).
I'm closing this as resolved/infogiven. I found out that your "official" support contract period starts on Oct 1 (when you can submit any severity ticket), and we're supporting sev-4 tickets until then.
(Forgot to close the ticket - actually closing) Feel free to reopen if you have further questions regarding hyperthreading.