Ticket 11274 - slurmctld exits when using "systemctl reload slurmctld" and a node change as been made in slurm.conf
Summary: slurmctld exits when using "systemctl reload slurmctld" and a node change as ...
Status: RESOLVED DUPLICATE of ticket 10597
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 20.11.4
Hardware: Linux Linux
: 5 - Enhancement
Assignee: Tim McMullan
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2021-04-01 14:34 MDT by Nick Ihli
Modified: 2021-05-03 14:36 MDT (History)
0 users

See Also:
Site: SchedMD
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Nick Ihli 2021-04-01 14:34:49 MDT
I am working with Jump Trading, and they reported that when using "systemctl reload slurmctld" after adding, removing, or modifying a node, slurmctld will exit. A SIGHUP is sent and the slurmctld is terminated. I have reproduced this as well. If the reload is done when no node changes are made, slurmctld will not exit.

They are aware now that they need to do a restart, but having it terminate itself after a SIGHUP was not expected behavior. 


[2021-04-01T12:36:20.299] Reconfigure signal (SIGHUP) received
[2021-04-01T12:36:20.300] debug:  Reading slurm.conf file: /opt/slurm/fpip1_testing_default/etc/slurm.conf
[2021-04-01T12:36:20.300] debug:  NodeNames=fpip1-compute0002 setting Sockets=35 based on CPUs(35)/(CoresPerSocket(1)/ThreadsPerCore(1))
[2021-04-01T12:36:20.300] debug:  NodeNames=fpip1-compute0004 setting Sockets=35 based on CPUs(35)/(CoresPerSocket(1)/ThreadsPerCore(1))
[2021-04-01T12:36:20.300] debug:  NodeNames=fpip1-compute0005 setting Sockets=35 based on CPUs(35)/(CoresPerSocket(1)/ThreadsPerCore(1))
[2021-04-01T12:36:20.300] debug:  NodeNames=fpip1-compute0006 setting Sockets=35 based on CPUs(35)/(CoresPerSocket(1)/ThreadsPerCore(1))
[2021-04-01T12:36:20.300] debug:  NodeNames=fpip1-login0001 setting Sockets=35 based on CPUs(35)/(CoresPerSocket(1)/ThreadsPerCore(1))
[2021-04-01T12:36:20.300] debug:  Reading cgroup.conf file /opt/slurm/fpip1_testing_default/etc/cgroup.conf
[2021-04-01T12:36:20.301] error: _compare_hostnames: node count has changed before reconfiguration from 4 to 5. You have to restart slurmctld.
[2021-04-01T12:36:20.301] fatal: read_slurm_conf: hostnames inconsistency detected
Comment 4 Tim Wickberg 2021-05-03 14:36:39 MDT
Marking as a duplicate of bug 10597.

*** This ticket has been marked as a duplicate of ticket 10597 ***