Ticket 9349

Summary: Excessive dns queries
Product: Slurm Reporter: lhuang
Component: slurmctldAssignee: Director of Support <support>
Status: RESOLVED TIMEDOUT QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 19.05.3   
Hardware: Linux   
OS: Linux   
Site: NY Genome Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: slurm.conf

Description lhuang 2020-07-07 09:26:25 MDT
Our slurm controllers are doing dns queries of up to 32 million queries a day with the dns servers. Do you have any idea why this is occurring? We have two slurm cluster and the other one does not have this issue. The one that is doing excessive queries is on a subdomain. Perhaps that is causing some kind of problem?

I'm unsure if this is username caching or nodelist issue. Shouldn't slurm be caching these so it does not require quering the dns servers?
Comment 1 lhuang 2020-07-07 13:07:16 MDT
Correction, both of our slurm cluster generates over 30 million dns queries per day. Unsure if this is normal or not.
Comment 2 Jeff DeGraw 2020-07-07 13:59:39 MDT
Thanks for reaching out. I suspect this is a configuration problem. Can you attach your slurm.conf to this bug?

- Jeff
Comment 3 lhuang 2020-07-07 14:20:02 MDT
Created attachment 14941 [details]
slurm.conf

Here is the attachment.


________________________________
From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Tuesday, July 7, 2020 3:59 PM
To: Luis Huang
Subject: [Bug 9349] Excessive dns queries


Comment # 2<https://urldefense.com/v3/__https://bugs.schedmd.com/show_bug.cgi?id=9349*c2__;Iw!!C6sPl7C9qQ!EWHIF6MnJ-VwRXrS_eXrD9fgkS4W9woz0AzcRgOyKA385z7q6f_PBq7rFR7f_fA$> on bug 9349<https://urldefense.com/v3/__https://bugs.schedmd.com/show_bug.cgi?id=9349__;!!C6sPl7C9qQ!EWHIF6MnJ-VwRXrS_eXrD9fgkS4W9woz0AzcRgOyKA385z7q6f_PBq7rm1Atk7w$> from Jeff DeGraw<mailto:jeff@schedmd.com>

Thanks for reaching out. I suspect this is a configuration problem. Can you
attach your slurm.conf to this bug?

- Jeff

________________________________
You are receiving this mail because:

  *   You reported the bug.

________________________________
This message is for the recipient's use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.
Comment 4 Jeff DeGraw 2020-07-07 14:44:16 MDT
In your compute nodes configuration, NodeHostname aren't configured. From the slurm.conf manual page:

NodeHostname:
Typically this would be the string that "/bin/hostname -s" returns. It may also be the fully qualified domain name as returned by "/bin/hostname -f" (e.g. "foo1.bar.com"), or any valid domain name associated with the host through the host database (/etc/hosts) or DNS, depending on the resolver settings. Note that if the short form of the hostname is not used, it may prevent use of hostlist expressions (the numeric portion in brackets must be at the end of the string). A node range expression can be used to specify a set of nodes. If an expression is used, the number of nodes identified by NodeHostname on a line in the configuration file must be identical to the number of nodes identified by NodeName. By default, the NodeHostname will be identical in value to NodeName.

You should set NodeHostname to the hostname of the machine that the nodes run on. For example, if I were running 10 nodes on a computer called "testcomputer", my config line should begin with:
> NodeName=node[0-9] NodeHostname=testcomputer ...

You also might need to give the fully qualified domain name since you mentioned they are on a subdomain.

Let me know if that helps!

- Jeff
Comment 5 Jeff DeGraw 2020-07-09 09:56:56 MDT
Hi again, I just wanted to follow up with you about this, now that there's been a few days to test the impact of what I suggested. Did configuring NodeHostname resolve the issue?

- Jeff
Comment 6 Jeff DeGraw 2020-07-10 09:02:40 MDT
I haven't heard back from you in a few days, so I'm going to go ahead and close this ticket, but feel free to open it back up if you need to. If you have any other questions or problems, don't hesitate to reach out.

- Jeff