Ticket 7355 - excessive slurmctld starting time
Summary: excessive slurmctld starting time
Status: RESOLVED INVALID
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 18.08.7
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Marcin Stolarek
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-07-04 04:08 MDT by IDRIS System Team
Modified: 2019-07-08 05:35 MDT (History)
2 users (show)

See Also:
Site: IDRIS
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description IDRIS System Team 2019-07-04 04:08:40 MDT
Hi,
We are installing Slurm 18.08.7 on a large configuration (about 1800 compute nodes) and we noticed that slurmctld takes quite a long time (around 30 minutes) to start or to re-read the configuration file slurm.conf .
Is this the starting time we should expect or could it be the result of a problem?
What can we do to make slurmctld start faster ?

Regards,

Philipe  Collinet
Comment 1 Marcin Stolarek 2019-07-04 07:48:29 MDT
Could you please share slurmctld log from start time with us with debug2 enabled? 

cheers,
Marcin
Comment 4 IDRIS System Team 2019-07-08 05:24:13 MDT
Hello,

 We setup the debug mode and it helped US.
 We found that the adress resolution of the node running the slurmctld was node correct and had to wait for the second adress to succed the resolution.

  Whence this issue corrected, the starting time is correct.

  Thanks for your help. We can close the Bug.

Best regards,

Philippe Collinet
Comment 5 Marcin Stolarek 2019-07-08 05:35:30 MDT
Thanks for the update. I'm closing this bug report as invalid.

cheers,
Marcin