Hi, We are in process of upgrading slurm from 20.11.7 to 21.08.6 the slurmdbd is up but slurmctld gives: [root@bigpurple-hn1 slurm]# systemctl status slurmctld ● slurmctld.service - Slurm controller daemon Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor preset: disabled) Active: active (running) since Sun 2022-03-27 12:49:12 EDT; 15min ago Process: 187525 ExecStart=/cm/shared/apps/slurm/current/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 187527 (slurmctld) Tasks: 20 Memory: 10.8M CGroup: /system.slice/slurmctld.service ├─187527 /cm/shared/apps/slurm/current/sbin/slurmctld └─187529 slurmctld: slurmscriptd Mar 27 12:49:12 bigpurple-hn1 systemd[1]: Started Slurm controller daemon. Mar 27 12:49:12 bigpurple-hn1 slurmctld[187527]: slurmctld version 21.08.6 started on cluster slurm_cluster Mar 27 12:49:12 bigpurple-hn1 slurmctld[187527]: error: _shutdown_bu_thread:send/recv bigpurple-hn2: Connection refused Mar 27 12:49:12 bigpurple-hn1 slurmctld[187527]: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:bigpurple-hn1:7920: ...on refused Mar 27 12:49:12 bigpurple-hn1 slurmctld[187527]: error: Sending PersistInit msg: Connection refused Mar 27 12:49:12 bigpurple-hn1 slurmctld[187527]: accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 with slurmdbd Mar 27 12:49:12 bigpurple-hn1 slurmctld[187527]: error: Sending PersistInit msg: Connection refused Mar 27 12:49:12 bigpurple-hn1 slurmctld[187527]: error: Sending PersistInit msg: Connection refused Mar 27 12:49:12 bigpurple-hn1 slurmctld[187527]: error: Association database appears down, reading from state file. Mar 27 12:49:22 bigpurple-hn1 slurmctld[187527]: error: Sending PersistInit msg: Connection refused Hint: Some lines were ellipsized, use -l to show in full. [
Please help it is urgent.
Please upload a copy of your slurmdbd.log and the output of sdiag. Also, please let us know if you see any errors when you start the slurmdbd process.
Hi Jason, The slurmdbd took quite a long time to convert the database and start communication with the port. and thus the delay with slurmctld to come up clean. our database is about 10GB and this is why.
I am lowering the severity based on your last reply. Do you require any further assistance?
I dont think so. Thank you Thanks Ali Siavosh-Haghighi, PhD Sr. HPC System Administrator, High-Performance Computing NYU Langone Health Medical Center Information Technology 227 E 30th St, #7-738 New York, NY 10016 O: 646-524-0860 C: 347-843-2357 siavoa01@nyumc.org<mailto:siavoa01@nyumc.org> nyulangone.org On Mar 28, 2022, at 10:16 AM, bugs@schedmd.com<mailto:bugs@schedmd.com> wrote: [EXTERNAL] Jason Booth<mailto:jbooth@schedmd.com> changed bug 13706<https://urldefense.com/v3/__https://bugs.schedmd.com/show_bug.cgi?id=13706__;!!MXfaZl3l!I26HYo0_WOPhRquV-61UvQAQ-u4mFKijBREDM2lp3g-edk0cwXJ2U3s1035Tcckm65wbzc9LtZOH$> What Removed Added Severity 1 - System not usable 4 - Minor Issue Comment # 4<https://urldefense.com/v3/__https://bugs.schedmd.com/show_bug.cgi?id=13706*c4__;Iw!!MXfaZl3l!I26HYo0_WOPhRquV-61UvQAQ-u4mFKijBREDM2lp3g-edk0cwXJ2U3s1035Tcckm65wbzQvqiC0v$> on bug 13706<https://urldefense.com/v3/__https://bugs.schedmd.com/show_bug.cgi?id=13706__;!!MXfaZl3l!I26HYo0_WOPhRquV-61UvQAQ-u4mFKijBREDM2lp3g-edk0cwXJ2U3s1035Tcckm65wbzc9LtZOH$> from Jason Booth<mailto:jbooth@schedmd.com> I am lowering the severity based on your last reply. Do you require any further assistance? ________________________________ You are receiving this mail because: * You reported the bug.
Resolved