| Summary: | Remap hwloc's l3cache as a Slurm socket | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Tim Wickberg <tim> |
| Component: | slurmd | Assignee: | Tim Wickberg <tim> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 5 - Enhancement | ||
| Priority: | --- | CC: | ezellma |
| Version: | 21.08.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | ORNL-OLCF | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 21.08.0rc2 | |
| Target Release: | 21.08 | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
Tim Wickberg
2021-08-09 12:34:49 MDT
Matt - As has been discussed extensively elsewhere, we're adding support to map hwloc's l3cache as a Slurm socket in 21.08. Support for this has just been pushed, and will be in 21.08.0rc2 which should be out this week. To enable, SlurmdParameters=l3cache_as_socket must be set. I'll note that the config file is explicitly ignored by 'slurmd -C', so you'll need to work out the appropriate NodeName definitions by hand and get them into the config. If you note any problems with this please let me know and I can look at making some changes. Unfortunately I don't have access to a test system at the moment with l3cache != socket, but I believe this should work as intended, and is functionally equivalent to patches we'd provided in the past. (Albeit those lacked a configuration option to change the behavior.) - Tim Thanks. It might make sense to coalesce the 3 options into a single parameter to avoid confusion, something like: SlurmdParameters=socket=package (to match hwloc 2.x nomenclature, functionally equivalent to and replaces ignore_numa) SlurmdParameters=socket=numa (current default on systems with multiple numa per package) SlurmdParameters=socket=l3cache (the new mode introduced in this ticket) I could also imagine heterogenous clusters where this needs to be set per node type instead of globally, but that is certainly outside the scope of this ticket. |