Ticket 17205 - slurmctld segfault while trying to set an external dbd for a second cluster
Summary: slurmctld segfault while trying to set an external dbd for a second cluster
Status: RESOLVED DUPLICATE of ticket 16669
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 23.02.3
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Nate Rini
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-07-13 14:03 MDT by Richard Johnson
Modified: 2023-07-14 09:28 MDT (History)
2 users (show)

See Also:
Site: HudsonAlpha Biotechnology
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurm.conf (2.32 KB, text/plain)
2023-07-13 14:05 MDT, Richard Johnson
Details
back trace from slurmctld (5.52 KB, text/plain)
2023-07-13 16:57 MDT, Richard Johnson
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Richard Johnson 2023-07-13 14:03:24 MDT
I'm trying to setup the scenario where I have 2 clusters each with their own slurmctld/dbd in separate physical locations separated by a WAN.

I have both clusters running on their own, but when I add AccountingStorageExternalHost for the remote cluster to the local slurm.conf, slurmctld segmentation faults on startup.

What am I doing wrong?  I will attach a slurm.conf for the local cluster.
Comment 1 Richard Johnson 2023-07-13 14:05:22 MDT
Created attachment 31229 [details]
slurm.conf
Comment 2 Jason Booth 2023-07-13 16:25:31 MDT
Would you please gather a backtrace from the segfault and attach that here?

gdb slurmctld

inside gdb:
> set print pretty
> r -D
> thread apply all bt full

Or simply

> bt full
Comment 4 Richard Johnson 2023-07-13 16:57:14 MDT
Created attachment 31232 [details]
back trace from slurmctld
Comment 5 Nate Rini 2023-07-13 17:31:45 MDT
This should have already been resolved for the upcoming Slurm-23.02.4 patch release:
> https://github.com/SchedMD/slurm/commit/833ca8dd2121a2c980736c05821608324c7ae97a

You can cherry-pick the commit and re-compile if waiting for the next release is not fast enough. Please reply if more detailed instructions are needed.
Comment 7 Richard Johnson 2023-07-14 08:39:12 MDT
Thanks Nate.  Do you have an estimate on when 23.02.4 will be released?

Thanks,
Rich

On Thu, Jul 13, 2023 at 6:31 PM <bugs@schedmd.com> wrote:

> *Comment # 5 <https://bugs.schedmd.com/show_bug.cgi?id=17205#c5> on bug
> 17205 <https://bugs.schedmd.com/show_bug.cgi?id=17205> from Nate Rini
> <nate@schedmd.com> *
>
> This should have already been resolved for the upcoming Slurm-23.02.4 patch
> release:> https://github.com/SchedMD/slurm/commit/833ca8dd2121a2c980736c05821608324c7ae97a
>
> You can cherry-pick the commit and re-compile if waiting for the next release
> is not fast enough. Please reply if more detailed instructions are needed.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 8 Nate Rini 2023-07-14 09:24:11 MDT
(In reply to Richard Johnson from comment #7)
> Thanks Nate.  Do you have an estimate on when 23.02.4 will be released?

I don't currently have an ETA, but we usually release one about every 2 months on the latest major release.
Comment 9 Nate Rini 2023-07-14 09:24:28 MDT
Are there any more questions?
Comment 10 Richard Johnson 2023-07-14 09:26:44 MDT
No.  Thank you.  I was able to apply that commit and it did fix my issue.

Thanks,
Rich

On Fri, Jul 14, 2023 at 10:24 AM <bugs@schedmd.com> wrote:

> *Comment # 9 <https://bugs.schedmd.com/show_bug.cgi?id=17205#c9> on bug
> 17205 <https://bugs.schedmd.com/show_bug.cgi?id=17205> from Nate Rini
> <nate@schedmd.com> *
>
> Are there any more questions?
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 11 Nate Rini 2023-07-14 09:28:14 MDT
(In reply to Richard Johnson from comment #10)
> No.  Thank you.  I was able to apply that commit and it did fix my issue.

Understood. Closing out ticket as a duplicate. Please respond if any new related questions should arise.

*** This ticket has been marked as a duplicate of ticket 16669 ***