Ticket 15322

Summary:	Would Slurm be happy with a hostname with multiple IP addresses?
Product:	Slurm	Reporter:	Chris Samuel (NERSC) <csamuel>
Component:	Configuration	Assignee:	Tim McMullan <mcmullan>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	4 - Minor Issue
Priority:	---	CC:	dmjacobsen
Version:	22.05.5
Hardware:	Linux
OS:	Linux
Site:	NERSC	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Chris Samuel (NERSC) 2022-10-31 21:43:01 MDT

Hi there,

On our new systems the default hostname for a node can have 1, 2 or 4 IP addresses depending on the number of HSN NICs it has. I've been using the NodeAddr setting to direct communications to the node management network interface instead since we started, but because of issues I mention in #15315 I'm looking to use the HSN instead.

If we were to drop the NodeAddr setting altogether (along with the NoAddrCache setting we currently have to have) would Slurm cope with the nodes having multiple functioning IPs?

All the best,
Chris

Comment 1 Tim McMullan 2022-11-07 06:21:09 MST

Hi Chris,

How does name resolution work for that network?  Is it just a DNS round-robin for all the active ports or something else going on?

I'm doing some digging and potentially some experimenting to see how this would get handled.

Thanks!
--Tim

Comment 2 Chris Samuel (NERSC) 2022-11-08 00:07:00 MST

Hi Tim,

It just returns all the IPs for the node, for example (on a test system):

muller:nid001088:~ # host nid001088
nid001088 has address 10.250.1.49
nid001088 has address 10.250.1.50
nid001088 has address 10.250.1.65
nid001088 has address 10.250.1.66
nid001088 has address 128.55.173.59

All addresses should be reachable from inside the system - including that public IP address which lives on hsn0:

muller:nid001088:~ # ip -4 addr show dev hsn0
3: hsn0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    altname enp194s0
    inet 10.250.1.65/16 brd 10.250.255.255 scope global hsn0
       valid_lft forever preferred_lft forever
    inet 128.55.173.59/24 brd 128.55.173.255 scope global hsn0
       valid_lft forever preferred_lft forever

All the best,
Chris

Comment 4 Tim McMullan 2023-03-13 09:29:12 MDT

Hey Chris,

First off, I'm sorry about the delay on this one...

It looks like from the other ticket you figured a different way around this, but after looking at the code and asking around it looks like this *should* work.  Its not very commonly done, but there isn't any specific handling for requests that came in from a specific IP, so as long as the slurmd is listening on the interface it should accept those connections and process them normally.

Thanks,
--Tim

Comment 5 Chris Samuel (NERSC) 2023-03-13 12:06:47 MDT

(In reply to Tim McMullan from comment #4)

> Hey Chris,

Hi Tim,

> First off, I'm sorry about the delay on this one...

No problem, I'm constantly swamped in email (got over 6000 unread in my inbox and that's after some aggressive filtering out of noise, very depressing). 

> It looks like from the other ticket you figured a different way around this,
> but after looking at the code and asking around it looks like this *should*
> work.  Its not very commonly done, but there isn't any specific handling for
> requests that came in from a specific IP, so as long as the slurmd is
> listening on the interface it should accept those connections and process
> them normally.

Thanks! Yeah I figured out how I could populate /etc/hosts in the container with entries for a single IP.

What I'm thinking of as an RFE/NRE topic though is being able to specify a single NodeAddr suffix for all compute nodes so we could have one piece of config that said something like:

NodeAddrSuffix=.chn.muller.nersc.gov

and then any lookups that happen are done by adding that suffix on to hostname. So for instance, nid001000 would be looked up as nid001000.chn.muller.nersc.gov, but another site could set it to be "NodeAddrSuffix=-mgmt" and that would lookup their compute01 as compute01-mgmt instead.

That would allow us to shrink the size of our config map for the slurmctld deployment considerably.

All the best,
Chris

Comment 6 Tim McMullan 2023-03-13 14:08:10 MDT

(In reply to Chris Samuel (NERSC) from comment #5)
> (In reply to Tim McMullan from comment #4)
> 
> > Hey Chris,
> 
> Hi Tim,
> 
> > First off, I'm sorry about the delay on this one...
> 
> No problem, I'm constantly swamped in email (got over 6000 unread in my
> inbox and that's after some aggressive filtering out of noise, very
> depressing). 

That is a truly horrifying inbox :(

> What I'm thinking of as an RFE/NRE topic though is being able to specify a
> single NodeAddr suffix for all compute nodes so we could have one piece of
> config that said something like:
> 
> NodeAddrSuffix=.chn.muller.nersc.gov
> 
> and then any lookups that happen are done by adding that suffix on to
> hostname. So for instance, nid001000 would be looked up as
> nid001000.chn.muller.nersc.gov, but another site could set it to be
> "NodeAddrSuffix=-mgmt" and that would lookup their compute01 as
> compute01-mgmt instead.
> 
> That would allow us to shrink the size of our config map for the slurmctld
> deployment considerably.

I thought this sounded reasonable and chatted with Tim (Wickberg) about it out of band.  Instead of an additional parameter, what if we were to change the parser so that something like this was valid:

> NodeName=nid[004860-004871] NodeAddr=nid[004860-004871]-nmn CPUs=256 Boards=1 ...

The thought being that supporting that kind of parsing might be more generically useful and accomplishes the same goal of making your config significantly smaller/easier to read.

If that works for you we can open an enhancement request for it, and if you decide you'd like to sponsor it you can always reach out as well!

Let me know what you think!
--Tim

Comment 7 Chris Samuel (NERSC) 2023-03-13 22:54:29 MDT

Hi Tim,

(In reply to Tim McMullan from comment #6)

> I thought this sounded reasonable and chatted with Tim (Wickberg) about it
> out of band.  Instead of an additional parameter, what if we were to change
> the parser so that something like this was valid:
> 
> > NodeName=nid[004860-004871] NodeAddr=nid[004860-004871]-nmn CPUs=256 Boards=1 ...
> 
> The thought being that supporting that kind of parsing might be more
> generically useful and accomplishes the same goal of making your config
> significantly smaller/easier to read.

I think that sounds good, as long as it can support non-contiguous ranges like the NodeName list can do (with this system a blade can hold 4 nodes, but if, for instance, you add GPUs then you lose a node for that, and that causes gaps in the numbering).

> If that works for you we can open an enhancement request for it, and if you
> decide you'd like to sponsor it you can always reach out as well!

I think that sounds like a good plan. I'll close this and open an RFE now.

We can look at NRE plans out of band.

All the best,
Chris