| Summary: | Using AMD chiplets in heterogeneous cluster | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Richard Lefebvre <richard.lefebvre> |
| Component: | Configuration | Assignee: | Marcin Stolarek <cinek> |
| Status: | RESOLVED DUPLICATE | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | cinek |
| Version: | 21.08.2 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=10679 | ||
| Site: | Calcul Quebec McGill | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: | Output of lscpu, lstopo-no-graphics, and version of hwloc | ||
|
Description
Richard Lefebvre
2021-10-12 13:10:54 MDT
Richard, From Slurm perspective chiplets are not the visible entity, since the hardware topology is mapped to hwloc objects[1] by the operating system. Could you please elaborate on what you mean in the reference to the results of `lstopo-no-graphics`? Please share the result of the command from the computing node of interest. What is your hwloc version? cheers, Marcin [1] https://www.open-mpi.org/projects/hwloc/doc/v2.3.0/a00165.php Created attachment 21779 [details]
Output of lscpu, lstopo-no-graphics, and version of hwloc
Attached is the output of the commands you asked. hwloc version is: hwloc-2.4.1-3.el8.x86_64 Since the new AMD CPUs with chiplets is going to be more popular. Some support for the chiplet will be requested in the future. The question can be rephrased like this: Can we replicate the Slurm socket affinity in chiplets? or Can there be a NUMA Node awareness to the scheduler level affinity? or How does SchedMD suggest handling the new level of chunking that's introduced with these AMD chiplets? Currently test jobs are being allocated across chiplets. Does SchedMD have evidence whether the performance implications of CPU distribution within/across chiplets is significant or minimal? We are normally running a large set of different jobs, lots of single core jobs, 2 cores, 4 cores ... Say I submit 4 cores jobs, and it get scheduled on a node that hat lots of single core jobs, we would like the scheduler to put the job on the same chiplet and not have the jobs running across chiplet, or even select another node (if available) that has 4 cores on the same chiplet. Note that the chiplets seem to follow the structure of the NUMA nodes. (see output of lscpu). Richard I have read bug 10679. If we use l3cache_as_socket, does the number of sockets needs to be changed in the definition of the nodes to reflect that increase if sockets? Richard Richard, >I have read bug 10679. If we use l3cache_as_socket That's a place where I was going to mention as a start point. It's good that you're on hwloc2 so you can use both `l3cache_as_socket` and you can give a try a patch attached there - the patch introduces a similar option called "numa_node_as_socket" which makes a binding meaning being dependent on the platform configuration. It would be great if you can give the patch (attachment 21486 [details]) a try and share the feedback with us. >in the definition of the nodes to reflect that increase if sockets? Yep - it still has to be adjusted. It's a part of bigger changes we're considering for the future to make the nodes' data structure more dynamic. cheers, Marcin Are the all the changes inside a specific branch of Git? Also, after realizing that using l3cache_as_socket would make the system like having 32 sockets, that would be too granular, I think. with numa node, it would be 8 sockets. We will probably try that first. Richard While compiling the patch under 21.08.2 we get the following compiling error:
xcpuinfo.c: In function 'slurmd_parameter_as_socket':
xcpuinfo.c:271:3: error: 'obj' undeclared (first use in this function)
obj = hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_NODE,
^~~
xcpuinfo.c:271:3: note: each undeclared identifier is reported only once for each function it appears in
xcpuinfo.c:271:36: error: 'topology' undeclared (first use in this function); did you mean 'openlog'?
obj = hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_NODE,
^~~~~~~~
openlog
make[4]: *** [Makefile:614: xcpuinfo.lo] Error 1
Richard
Richard,
Sorry for that. Please try attachment 21810 [details] where the issue should be fixed.
cheers,
Marcin
Hi, The new patch compiles, thank you. We will try it later today. Will this patch part of future versions of 21.08.x? Richard Our standard approach for new features is to include those only on major releases, however, because of the importance of both - new architectures and hwloc2 support we agreed that we'll do our best to include at least basic support in 21.08. We're now looking forward for the feedback on the approach from sites testing it. cheers, Marcin Question about the patch With order should servers and client need to be restarted with the patch. Does the DBserver needs to be patched too. Richard The patch has an impact only on slurmd - no need to restart slurmctld/slurmdbd. Richard, I'll go ahead and mark the case as duplicate. You'll get added to CC of the original bug, so you get notifications if anything changes there cheers, Marcin *** This ticket has been marked as a duplicate of ticket 10679 *** |