| Summary: | Can hyperthreading be configured on all nodes and with "--hint=nomultithread" be made to not impact jobs where hyperthreading is an issue? | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Mike Woodson <maw349> |
| Component: | Other | Assignee: | Carlos Tripiana Montes <tripiana> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | - Unsupported Older Versions | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Cornell ITSG | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Mike Woodson
2022-12-16 12:38:17 MST
Hi Mike,
I think the core (key) question is this one, at the end of the day:
> If I understand it correctly, if we were to have the default be
> "ntasks-per-core=1" and have hyperthreading turned on in the BIOS of each
> server, each core would have two virtual cores. In scheduling a job, it
> looks like the second virtual core on a core would also be allocated to that
> job, but does it mean that the single thread of the task would have full
> access to the complete resources of the core, or still be limited to the
> resources of 1 virtual core? And how would we test that?
Based on my testbed, you should be doing like:
1. Enable HT in bios, restart system.
2. Edit slurm.conf, and move from:
CPUS=X*Y CoresPerSocket=X Sockets=Y ThreadsPerCore=1
to:
CPUS=X*Y*2 CoresPerSocket=X Sockets=Y ThreadsPerCore=2
3. Restart slurm daemons
I anybody asks for cores, and uses pinning to cores, everything will work as before, no matter that only 1 virtual cpus per core is in use if you look at something like top command.
If anybody wants to use logical cpus in their jobs, then they must configure the job accordingly.
The question... the question is that users aren't going to set things up well at first, and will complain about the change. And sometimes mpi flavours does things differently from srun and pinning is not always as you might think... Well, this is the main reason most of the times admins just disable HT. Because most of the HPC codes just don't run as performant with as HT as w/o. But I am not saying it's because HT, I am saying the bad thing is to put 2 MPI tasks in 1 physical core (1 per logical cpu). To properly scale most of the HPC codes you need real cores, as logical CPUs share some HW resources inside each physical core and those codes heavily use all a core can offer.
I am a bit reluctant to say go for a global change for only one user, even if they own some HW, if this can potentially impact on the jobscripts of all the other users, and would potentially cause noise and trouble for them once they get used to the new configuration. Because, as you see, there's no magic, transparent, way of making such big change.
I'd suggest you to put this node(s) with HT enabled for them, and allow the rest to use it with lower prio, but in separate partition to the ones with ThreadsPerCore=1.
Regards,
Carlos.
Hi, So, if I understand what you are suggesting: It can be done, but all users will have to change the way that they submit jobs, since no one currently uses pinning to cores. With HT, mpi is an issue since having two mpi processes on the same physical core (2 logical cores), they will both want to use all of the resources of the physical core and since the 2 logical cores share some resources on the physical core, it slows things down (unless they pin the cores). If this owner wants HT turned on for their nodes, it sounds like the best solution is to remove the nodes from the default partition and possibly create a another low priority partition (maybe called hyperthreading) if someone wants to use that node. The owner can use their higher priority partition to submit jobs to their nodes, but since no other nodes will have HT turned on, they will not be able to utilize it on other nodes. Is this correct? Mike > It can be done, but all users will have to change the way that they submit > jobs, since no one currently uses pinning to cores. Slurm does it own default pinning as well. MPI flavours do default pinning as well, and sometimes different from Slurm. But, if you fallback to default pinning, it's sometimes different when ThreadsPerCore changes. Thus, it depends on what the user is using/doing, but chances are that pinning could have changed. > With HT, mpi is an issue since having two mpi processes on the same physical > core (2 logical cores), they will both want to use all of the resources of > the physical core and since the 2 logical cores share some resources on the > physical core, it slows things down (unless they pin the cores). If the user confuses the concept of a core with a logical cpu, and tries to use each logical cpu as if it was physical, then happens what you are explaining. If, for example, you set 1 task per core and ask for the amount of cores you need, slurm by default (if launched with srun) puts 1 mpi task per physical core, but again, chances are that a user has a combination of flags/mpi type that can be bad after changing ThreadsPerCore. > If this owner wants HT turned on for their nodes, it sounds like the best > solution is to remove the nodes from the default partition and possibly > create a another low priority partition (maybe called hyperthreading) if > someone wants to use that node. The owner can use their higher priority > partition to submit jobs to their nodes, but since no other nodes will have > HT turned on, they will not be able to utilize it on other nodes. Maybe it's a conservative option, but it's an alternative that *you really know it's not going to potentially affect every single other job in the system*. So... probably you want to explore it. Regards, Carlos. Hi, Please, let us know if you need further assistance or it can be marked as resolved/infogiven. Thanks, Carlos. I believe that you have given me what I need. Thanks! Mike From: bugs@schedmd.com <bugs@schedmd.com> Date: Wednesday, December 21, 2022 at 10:46 AM To: Michael Anthony Woodson <maw349@cornell.edu> Subject: [Bug 15648] Can hyperthreading be configured on all nodes and with "--hint=nomultithread" be made to not impact jobs where hyperthreading is an issue? Comment # 4<https://bugs.schedmd.com/show_bug.cgi?id=15648#c4> on bug 15648<https://bugs.schedmd.com/show_bug.cgi?id=15648> from Carlos Tripiana Montes<mailto:tripiana@schedmd.com> Hi, Please, let us know if you need further assistance or it can be marked as resolved/infogiven. Thanks, Carlos. ________________________________ You are receiving this mail because: * You reported the bug. Closing now. Thanks, Carlos |