Ticket 7466

Summary:	Warning in logs: "TOPOLOGY: warning -- no switch can reach all nodes through its descendants.Do not use route/topology" -- any actual impact?
Product:	Slurm	Reporter:	Ryan Novosielski <novosirj>
Component:	Configuration	Assignee:	Marshall Garey <marshall>
Status:	RESOLVED FIXED	QA Contact:
Severity:	4 - Minor Issue
Priority:	---	CC:	kevina
Version:	- Unsupported Older Versions
Hardware:	Linux
OS:	Linux
See Also:	https://bugs.schedmd.com/show_bug.cgi?id=6824 https://bugs.schedmd.com/show_bug.cgi?id=8395 https://bugs.schedmd.com/show_bug.cgi?id=8514 https://bugs.schedmd.com/show_bug.cgi?id=8549 https://bugs.schedmd.com/show_bug.cgi?id=8847 https://bugs.schedmd.com/show_bug.cgi?id=8769
Site:	Rutgers	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	CentOS	Machine Name:	amarel
CLE Version:		Version Fixed:	19.05.3 20.02.0pre1
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---
Attachments:	SLURM topology.conf file for amarel

Description Ryan Novosielski 2019-07-24 10:25:36 MDT

We intentionally have our topology configured such that there are no switches that can reach all nodes. We do this to keep jobs from running across two fabrics that are connected via narrower links (higher oversubscription across these links). 

Does this error message mean that A) you've got your topology misconfigured, you should not use topology, "please" don't use the topology plugin or B) you've got your topology misconfigured, you should not use topology, we have automatically disabled the use of the topology plugin?

I'm assuming it's A but can't be sure. A change in the wording might be a good idea (also, the spacing after the period in the message just for appearances).

Can you also comment on our practice? I'll attach the topology file so you've got an idea of what we're doing. Basically, we list the EDR and FDR fabrics separately, and jobs can run on either fabric but not a single job that runs on both.

Comment 1 Ryan Novosielski 2019-07-24 10:28:23 MDT

Created attachment 10998 [details]
SLURM topology.conf file for amarel

Comment 2 Ryan Novosielski 2019-07-24 10:29:30 MDT

Running 17.11.7, BTW. Planning to upgrade to 18.08.x in the near future.

Comment 4 Marshall Garey 2019-07-24 11:56:29 MDT

(A) is correct - this is just a warning/recommendation, and does not silently change your route plugin.

This warning is just telling you that jobs won't be able to use nodes where no switch connects them. If you try to run a job that requests nodes across disjointed switches, the client (sbatch/srun/salloc) gets a warning:

(For sbatch)
sbatch: error: Batch job submission failed: Requested node configuration is not available

(For srun)
srun: error: Unable to allocate resources: Requested node configuration is not available

(For salloc)
salloc: error: Job submit/allocate failed: Requested node configuration is not available
salloc: Job allocation 10243 has been revoked.


Since you're doing this intentionally, don't worry about the warning. If your users are confused about the "Requested node configuration is not available" message, make sure they check the number of nodes or nodelist they requested and make sure it's in line with your topology.

Another thing you could do is make sure a single partition doesn't have nodes that are in different fabrics and therefore can never be part of the same job.

There's no problem with your topology configuration. I don't have any actual data for how common it is, but the impression at SchedMD is that it is common.

I'll look into softening the language in that log message and fix the spacing.

Comment 7 Ryan Novosielski 2019-07-24 12:22:58 MDT

It's a thought. It would require the users knowing about/submitting to two different partitions in a job. We also use federation in the partition in question that will allow jobs to federate to other clusters, so this would seem to be the better path.

Comment 8 Marshall Garey 2019-07-24 12:29:46 MDT

(In reply to Ryan Novosielski from comment #7)
> It's a thought. It would require the users knowing about/submitting to two
> different partitions in a job. We also use federation in the partition in
> question that will allow jobs to federate to other clusters, so this would
> seem to be the better path.

Whatever works best for you. If you have questions about how to handle the multi-cluster environment, feel free to submit a ticket about that.

I've submitted a patch to be reviewed that softens the language in the log message.

Comment 10 Ryan Novosielski 2019-07-24 13:00:07 MDT

Thanks -- is it possible to have a look at it? Would be happy to mention whether or not the rewrite would have helped me. 

Thanks again.

Comment 13 Marshall Garey 2019-07-24 15:16:44 MDT

(In reply to Ryan Novosielski from comment #10)
> Thanks -- is it possible to have a look at it? Would be happy to mention
> whether or not the rewrite would have helped me. 

I'm further investigating why that specific log message was put there in the first place and why it indicated that route/topology shouldn't be used. So, I'm still iterating on the log message. I'll let you know when I've got something more solid and see what you think.

Comment 14 Ryan Novosielski 2019-07-24 15:20:12 MDT

Thanks. Would be interested to know as well.

Comment 17 Marshall Garey 2019-08-13 17:06:31 MDT

Hi Ryan,

Here's my current conclusion:

This warning has been in there since the beginning of the route plugin. It was added in commit 0dde0a71c10 by bull:

commit 0dde0a71c10c67ca815b2e70504ce463a9a7b95b
Author: Rod Schultz <Rod.Schultz@bull.com>
Date:   Fri Jul 11 10:02:22 2014 -0700

    Initial addition of the routing plugins



I can't think of how route/topology could be bad with a disjointed switch topology. It might be better to use route/topology instead of route/none with a disjointed topology (though I'm not sure).

Anyway, I suspect that bull threw this error/info-warning in there because they thought it might be important, not because of any sure-fire testing that they did.

Here's the language I've been toying around with (still subject to change, in particular potentially removing the word "warning"):

info("TOPOLOGY: warning -- no switch can reach all nodes through its descendants. If this is not intentional, fix the topology.conf file.");

What do you think about that sentence? Would it have helped you more than the previous warning? Suggestions are welcome.

Comment 18 Ryan Novosielski 2019-08-13 17:06:49 MDT

I will be out of the office from Tuesday, August 13th to Friday, August 16th, returning Monday, August 19th. If this is an urgent message, please contact help@oarc.rutgers.edu (if you have not already done so) for a more immediate response. Thank you for your patience.

Comment 20 Ryan Novosielski 2019-08-20 16:21:25 MDT

Sorry for the delay. Will talk it over with the coworker that raised this as a concern to me and see if there are any more suggestions. That's a better message than the existing one. I probably would have suggested something along the lines of: 

info("TOPOLOGY: warning -- no switch can reach all nodes through its descendants. Check topology.conf for errors.");

...if I'd not read the original, but yours is good too.

Comment 22 Marshall Garey 2019-08-21 16:30:38 MDT

Sounds good. Let us know if there are any concerns for the error message; if you or your coworker would like it changed, we can do that. Otherwise, if it's good with you, we'll commit what I've given you.

Comment 24 Ryan Novosielski 2019-08-21 16:36:00 MDT

He had an interesting suggestion, I thought, which was a link to the documentation which could be more verbose. Of course, that would require whatever it was not to move around/to be a virtual URL. Solaris did something like this, if you see the below:

https://docs.oracle.com/cd/E36784_01/html/E48546/fmasvcs.html

They give you a short message and then for more information, see such and such. Maybe more more work than this warrants, but a thought. He didn't have any particular comments on the suggested change, and I do agree that it's better than the existing.

Comment 25 Marshall Garey 2019-08-23 10:34:50 MDT

That's an interesting idea. I proposed it internally, and we decided that the effort isn't worth it, especially with the risk of dead URL's like you pointed out. Our goal is to try to make source comments/logs stand alone and give enough information for someone to know what to search for.

I'll let you know when we've committed the change.

Comment 28 Marshall Garey 2019-08-26 10:05:02 MDT

We've pushed the fix in commit cad50250c58e. It will be in the next 19.05 tag (19.05.3). Thanks for the report and discussion. Let us know if you have any more concerns or questions. I'm closing this as resolved/fixed.