Hi, We are looking to optimize our use of the new AMD EPYC 7532 on our new cluster. We would like to have an affinity to the chiplets inside the EPYC CPU. We are running non-homogeneous jobs on the cluster which can vary very wildly, 1 core to 1000s of cores. We would like jobs with small amounts of cores to have an affinity to run on the same chiplet to maximize performance. Do you have any suggestions? Preferred configuration? This using the latest version of Slurm (21.08.2). Richard
Richard, From Slurm perspective chiplets are not the visible entity, since the hardware topology is mapped to hwloc objects[1] by the operating system. Could you please elaborate on what you mean in the reference to the results of `lstopo-no-graphics`? Please share the result of the command from the computing node of interest. What is your hwloc version? cheers, Marcin [1] https://www.open-mpi.org/projects/hwloc/doc/v2.3.0/a00165.php
Created attachment 21779 [details] Output of lscpu, lstopo-no-graphics, and version of hwloc
Attached is the output of the commands you asked. hwloc version is: hwloc-2.4.1-3.el8.x86_64 Since the new AMD CPUs with chiplets is going to be more popular. Some support for the chiplet will be requested in the future. The question can be rephrased like this: Can we replicate the Slurm socket affinity in chiplets? or Can there be a NUMA Node awareness to the scheduler level affinity? or How does SchedMD suggest handling the new level of chunking that's introduced with these AMD chiplets? Currently test jobs are being allocated across chiplets. Does SchedMD have evidence whether the performance implications of CPU distribution within/across chiplets is significant or minimal? We are normally running a large set of different jobs, lots of single core jobs, 2 cores, 4 cores ... Say I submit 4 cores jobs, and it get scheduled on a node that hat lots of single core jobs, we would like the scheduler to put the job on the same chiplet and not have the jobs running across chiplet, or even select another node (if available) that has 4 cores on the same chiplet. Note that the chiplets seem to follow the structure of the NUMA nodes. (see output of lscpu). Richard
I have read bug 10679. If we use l3cache_as_socket, does the number of sockets needs to be changed in the definition of the nodes to reflect that increase if sockets? Richard
Richard, >I have read bug 10679. If we use l3cache_as_socket That's a place where I was going to mention as a start point. It's good that you're on hwloc2 so you can use both `l3cache_as_socket` and you can give a try a patch attached there - the patch introduces a similar option called "numa_node_as_socket" which makes a binding meaning being dependent on the platform configuration. It would be great if you can give the patch (attachment 21486 [details]) a try and share the feedback with us. >in the definition of the nodes to reflect that increase if sockets? Yep - it still has to be adjusted. It's a part of bigger changes we're considering for the future to make the nodes' data structure more dynamic. cheers, Marcin
Are the all the changes inside a specific branch of Git? Also, after realizing that using l3cache_as_socket would make the system like having 32 sockets, that would be too granular, I think. with numa node, it would be 8 sockets. We will probably try that first. Richard
While compiling the patch under 21.08.2 we get the following compiling error: xcpuinfo.c: In function 'slurmd_parameter_as_socket': xcpuinfo.c:271:3: error: 'obj' undeclared (first use in this function) obj = hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_NODE, ^~~ xcpuinfo.c:271:3: note: each undeclared identifier is reported only once for each function it appears in xcpuinfo.c:271:36: error: 'topology' undeclared (first use in this function); did you mean 'openlog'? obj = hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_NODE, ^~~~~~~~ openlog make[4]: *** [Makefile:614: xcpuinfo.lo] Error 1 Richard
Richard, Sorry for that. Please try attachment 21810 [details] where the issue should be fixed. cheers, Marcin
Hi, The new patch compiles, thank you. We will try it later today. Will this patch part of future versions of 21.08.x? Richard
Our standard approach for new features is to include those only on major releases, however, because of the importance of both - new architectures and hwloc2 support we agreed that we'll do our best to include at least basic support in 21.08. We're now looking forward for the feedback on the approach from sites testing it. cheers, Marcin
Question about the patch With order should servers and client need to be restarted with the patch. Does the DBserver needs to be patched too. Richard
The patch has an impact only on slurmd - no need to restart slurmctld/slurmdbd.
Richard, I'll go ahead and mark the case as duplicate. You'll get added to CC of the original bug, so you get notifications if anything changes there cheers, Marcin *** This ticket has been marked as a duplicate of ticket 10679 ***