Hi! We're probably doing something wrong here, so bear with me, but it looks like some job allocation can span disjoint networks defined in topology.conf. The documentation at https://slurm.schedmd.com/topology.html states that that: """ compute nodes on switches that lack a common parent switch can be used, but no job will span leaf switches without a common parent. """ We have the following configuration in topology.conf that basically looks like this: SwitchName=core1 Switches=sw1,sw2 SwitchName=s1 Nodes=n11,n12 SwitchName=s2 Nodes=n21,n22 SwitchName=core2 Switches=sw3,sw4 SwitchName=s3 Nodes=n31,n32 SwitchName=s4 Nodes=n41,n42 And this in slurm.conf: TopologyPlugin=topology/tree TopologyParam=TopoOptional And I have noticed some large job allocations (not using any --switches option) spanning across core1 and core2 (for instance, a single job allocating n11 and n41). Is that expected? Thanks! -- Kilian
Hi Yes, that is expected when TopoOptional is used, unless jobs request for some switches. Dominik
Hi Dominik, (In reply to Dominik Bartkiewicz from comment #1) > Yes, that is expected when TopoOptional is used, > unless jobs request for some switches. Ah I see, thanks! Do you think it would be useful to add some clarification to the documentation? The TopoOptional description in the slurm.conf man page doesn't mention anything about disjoint networks, it would probably be worth a mention that using this option could span jobs over disjoint networks. Same thing for topology.conf something like: "no job will span leaf switches without a common parent (unless the TopologyParam=TopoOptional option is used)." Thanks! -- Kilian
Hi As you suggested we added this info to doc https://github.com/SchedMD/slurm/commit/2d09a777443ded4b1 It is in 17.11.5 and up. Dominik
(In reply to Dominik Bartkiewicz from comment #5) > Hi > > As you suggested we added this info to doc > https://github.com/SchedMD/slurm/commit/2d09a777443ded4b1 > It is in 17.11.5 and up. Great, thanks! Cheers, -- Kilian