| Summary: | CPU binding with nomultithread and exclusive options | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | IDRIS System Team <gensyshpe> |
| Component: | Scheduling | Assignee: | Marcin Stolarek <cinek> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | Brian Christiansen <brian> |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | brian, remi.lacroix |
| Version: | 20.02.4 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=10474 | ||
| Site: | IDRIS | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: | v2 | ||
|
Description
IDRIS System Team
2020-08-26 04:08:03 MDT
I was able to reproduce the reported behavior. I have a patch for wrong binding (which is the main issue reported here) that is under our internal review. For the memory, the case is at least worth checking, however, to make the discussion structured I'll open a separate bug report and you to CC there. cheers, Marcin Hi! Any news on the patch? Created attachment 16061 [details]
v2
The issue is resolved by other changes on master branch (slurm-20.11 to be). We're still discussing how to best address it on slurm-20.02. Could you please apply the attached patch and confirm that it solves the issue for you?
cheers,
Marcin
The patch fixes this issue but we noticed an other binding problem (see #10019). Focusing on this case. Are you OK with the case closure with just local fix delivered as information given? As I mentioned this is fixed in 20.11 by other work that was a substantial improvement in the handling of --threads-per-core for management of steps inside allocation. We're close to the release of 20.11 and the attached patch being a fix is also a change in behavior that would be specific for only for late releases of 20.02, which may finally be more confusion than a fix for the wide range of users. Let me know your thoughts. cheers, Marcin Can you share your thoughts on the closure suggestion from comment 13? In case of no reply I'll cluse the case as "information given". cheers, Marcin Hi! Can the patch be also applied on versions >20.02.4? If so we agree to close this case. (In reply to Marcin Stolarek from comment #14) > Can you share your thoughts on the closure suggestion from comment 13? In > case of no reply I'll cluse the case as "information given". > > cheers, > Marcin The patch should be easy to apply locally - it's very simple and I don't expect any code changes in this area in up-coming minor releases of 20.02. If it doesn't apply you can always reopen the bug and I'll prepare an appropriate patch for you. We just don't want to make any changes on 20.02, since the code is subjected to a larger rewrite in 20.11 and we want to avoid frequent changes in the same area. Does that make sense for you? cheers, Marcin Ok! The case can be closed. (In reply to Marcin Stolarek from comment #17) > The patch should be easy to apply locally - it's very simple and I don't > expect any code changes in this area in up-coming minor releases of 20.02. > If it doesn't apply you can always reopen the bug and I'll prepare an > appropriate patch for you. > > We just don't want to make any changes on 20.02, since the code is subjected > to a larger rewrite in 20.11 and we want to avoid frequent changes in the > same area. > > Does that make sense for you? > > cheers, > Marcin |