Ticket 11134

Summary: slurmrestd unable to authenticate
Product: Slurm Reporter: Jeff Avila <geoffrey_avila>
Component: slurmrestdAssignee: Nate Rini <nate>
Status: RESOLVED TIMEDOUT QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: cinek, nate
Version: 20.02.6   
Hardware: Linux   
OS: Linux   
Site: Brown Univ Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: config.log from build
nm output
./configure output
latest config.log
latest config.log output
config.log build attempt
slurm.conf file

Description Jeff Avila 2021-03-18 12:56:30 MDT
Hi Folks,

I've installed an RPM-packaged slurmrestd on a node that we're going to have host the endpoint. I can start slurmrestd ok, give it a hostname to listen to, and a port to listen on, but it sends these errors to stdout:

slurmrestd: error: cannot find auth plugin for auth/munge
slurmrestd: error: cannot create auth context for auth/munge
slurmrestd: error: Couldn't find the specified plugin name for auth/munge looking at all files
slurmrestd: error: cannot find auth plugin for auth/munge
slurmrestd: error: cannot create auth context for auth/munge

If I poke at that port via curl; I get this:

[gavila1@login006 ~]$ curl 'http://172.20.3.205:8080/slurm/v1/ping'
Authentication failure
[gavila1@login006 ~]$

I fear that I am missing some component. We didn't build slurmctld when we put together this current slurm installation; hence my install from third-party RPMs.
Can someone tell me the filename that slurmrestd is looking for, and how it is usually packaged? A more in-depth discussion of how different forms of authentication are implemented for slurmrestd would be also much appreciated; the manpage is a little light in that regard...

Thanks as always, 

-Jeff
Comment 1 Nate Rini 2021-03-18 12:58:54 MDT
(In reply to Jeff Avila from comment #0)
> slurmrestd: error: cannot find auth plugin for auth/munge

Is slurmrestd installed along with the full Slurm stack along with Munge?
Comment 2 Nate Rini 2021-03-18 13:00:17 MDT
Please also provide:
> systemctl status slurmrestd
Comment 3 Jeff Avila 2021-03-18 14:32:44 MDT
Hi Folks,

This is only a submit host; the munge binaries live in an nfs-mounted
/usr/local/sbin, as does slurmrestd.
munge is running; viz:
[root@pslurmctlapicit sbin]# systemctl status munge
● munge.service - MUNGE authentication service
   Loaded: loaded (/usr/local/lib/systemd/system/munge.service; enabled;
vendor preset: disabled)
   Active: active (running) since Tue 2021-03-16 18:03:31 EDT; 1 day 22h ago
     Docs: man:munged(8)
  Process: 112180 ExecStart=/usr/local/sbin/munged (code=exited,
status=0/SUCCESS)
 Main PID: 112182 (munged)
   CGroup: /system.slice/munge.service
           └─112182 /usr/local/sbin/munged

Mar 16 18:03:31 pslurmctlapicit systemd[1]: Starting MUNGE authentication
service...
Mar 16 18:03:31 pslurmctlapicit systemd[1]: Started MUNGE authentication
service.

slurmrestd, otoh, is being run directly from the command-line for testing;
I don't have a systemd unit for it.

Thanks,

-Jeff

On Thu, Mar 18, 2021 at 3:00 PM <bugs@schedmd.com> wrote:

> *Comment # 2 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c2> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> Please also provide:> systemctl status slurmrestd
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 4 Nate Rini 2021-03-18 14:39:26 MDT
(In reply to Jeff Avila from comment #3)
> This is only a submit host; the munge binaries live in an nfs-mounted
> /usr/local/sbin, as does slurmrestd.
I generally advise against sites running Slurm (binaries and libraries) in NFS as NFS issues can cause the site to appear down.

> slurmrestd, otoh, is being run directly from the command-line for testing;
Please call this and attach the output:
> echo -e 'GET invalid\r\n\r\n'| LD_DEBUG=all slurmrestd -vvvvv

Using screen/tmux/script is suggested since it might get very verbose.

> I don't have a systemd unit for it.
Okay, the example one was added in slurm-20.11.
Comment 5 Jeff Avila 2021-03-18 15:09:22 MDT
Created attachment 18538 [details]
slurmrestd.txt

Here you go:

On Thu, Mar 18, 2021 at 4:39 PM <bugs@schedmd.com> wrote:

> *Comment # 4 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c4> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #3 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c3>)> This is only a submit host; the munge binaries live in an nfs-mounted
> > /usr/local/sbin, as does slurmrestd.
> I generally advise against sites running Slurm (binaries and libraries) in NFS
> as NFS issues can cause the site to appear down.
> > slurmrestd, otoh, is being run directly from the command-line for testing;
> Please call this and attach the output:> echo -e 'GET invalid\r\n\r\n'| LD_DEBUG=all slurmrestd -vvvvv
>
> Using screen/tmux/script is suggested since it might get very verbose.
> > I don't have a systemd unit for it.
> Okay, the example one was added in slurm-20.11.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 6 Nate Rini 2021-03-18 15:22:12 MDT
(In reply to Jeff Avila from comment #5)
> Created attachment 18538 [details]
> slurmrestd.txt

It doesn't even attempt to load munge which is unexpected. Are you setting SLURM_JWT in the environment before calling slurmrestd?

Is it possible to get the config.log from the slurm build? Since this is from an RPM, you will need to pass this to rpmbuild to avoid it deleting the build directory:
> rpmbuild -D 'noclean 1'  -D 'rel 1' $@
Comment 7 Jeff Avila 2021-03-18 19:13:25 MDT
I am not setting  SLURM_JWT to anything before calling slurmrestd...

As I was trying to say; I didn't know which slurm package srcrpm contained
slurmrestd, so I found a prebuilt slurmrestd rpm (for version 20.02-6) on
the Scientific Linux repository, and installed it that way. I don't have
access to the config.log.

Thanks,

-Jeff


On Thu, Mar 18, 2021 at 5:22 PM <bugs@schedmd.com> wrote:

> *Comment # 6 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c6> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #5 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c5>)> Created attachment 18538 [details] <https://bugs.schedmd.com/attachment.cgi?id=18538> [details] <https://bugs.schedmd.com/attachment.cgi?id=18538&action=edit>
> > slurmrestd.txt
>
> It doesn't even attempt to load munge which is unexpected. Are you setting
> SLURM_JWT in the environment before calling slurmrestd?
>
> Is it possible to get the config.log from the slurm build? Since this is from
> an RPM, you will need to pass this to rpmbuild to avoid it deleting the build
> directory:> rpmbuild -D 'noclean 1'  -D 'rel 1' $@
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 8 Nate Rini 2021-03-18 19:25:36 MDT
(In reply to Jeff Avila from comment #7)
> As I was trying to say; I didn't know which slurm package srcrpm contained
> slurmrestd, so I found a prebuilt slurmrestd rpm (for version 20.02-6) on
> the Scientific Linux repository, and installed it that way. I don't have
> access to the config.log.

Please attach the slurm.conf your using with the Slurm cluster and with slurmrestd.
Comment 10 Nate Rini 2021-03-19 10:55:26 MDT
(In reply to Jeff Avila from comment #9)
> Here you go:
> AuthType=auth/munge

Looks like it should be trying to load munge.

(In reply to Jeff Avila from comment #7)
> As I was trying to say; I didn't know which slurm package srcrpm contained
> slurmrestd, so I found a prebuilt slurmrestd rpm (for version 20.02-6) on
> the Scientific Linux repository, and installed it that way. I don't have
> access to the config.log.

Based on what has been provided: the Slurm provided by the repo was not built correctly.

We have zero control over the Slurm packages include in EPEL or in Scientific Linux and strongly suggest against supported sites from using them. I would be happy to assist with instructions on how to compile Slurm for your cluster.

I would first suggest trying our general 'building and installing slurm' instructions here:
> https://slurm.schedmd.com/quickstart_admin.html

Please note that the instructions also include how to build the RPMs for RHEL clones.
Comment 11 Jeff Avila 2021-03-19 13:27:12 MDT
Hi Nate,

So, in light of yr. advice, I went back to our original source tarball and tried to rebuild the whole thing in order to get slurmrestd/libslurmfull

libtool: link: gcc -DNUMA_VERSION1_COMPATIBILITY -g -O2 -std=gnu99 -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -Wl,-rpath -Wl,/usr/lib64 -Wl,--no-as-needed -o .libs/slurmd slurmd.o req.o get_mach_stat.o ../common/libslurmd_common.o -Wl,-rpath=/home/gba/slurm20/lib/slurm -Wl,--export-dynamic  -L/usr/lib64 ../../../src/common/.libs/libdaemonize.a ../../../src/bcast/.libs/libfile_bcast.a -L/usr/lib -lz -llz4 ../common/.libs/libslurmd_reverse_tree_math.a -L../../../src/api/.libs /home/gba/slurm-20.02.6/src/api/.libs/libslurmfull.so -ldl -lnuma -lhwloc -lpam -lpam_misc -lutil -lresolv -pthread -Wl,-rpath -Wl,/home/gba/slurm20/lib/slurm
../common/libslurmd_common.o: In function `xcpuinfo_hwloc_topo_load':
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:224: undefined reference to `hwloc_topology_set_type_filter'
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:226: undefined reference to `hwloc_topology_set_type_filter'
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:228: undefined reference to `hwloc_topology_set_type_filter'
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:230: undefined reference to `hwloc_topology_set_type_filter'
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:232: undefined reference to `hwloc_topology_set_type_filter'
../common/libslurmd_common.o:/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:234: more undefined references to `hwloc_topology_set_type_filter' follow
collect2: error: ld returned 1 exit status
make[4]: *** [slurmd] Error 1
make[4]: Leaving directory `/home/gba/slurm-20.02.6/src/slurmd/slurmd'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory `/home/gba/slurm-20.02.6/src/slurmd'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/gba/slurm-20.02.6/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/gba/slurm-20.02.6'
make: *** [all] Error 2
[root@slurmctld slurm-20.02.6]#

Any ideas?

Thanks,

-Jeff
Comment 12 Nate Rini 2021-03-19 13:31:04 MDT
(In reply to Jeff Avila from comment #11)
> So, in light of yr. advice, I went back to our original source tarball and
> tried to rebuild the whole thing in order to get slurmrestd/libslurmfull
Yes, in general it is not possible to build a single component of Slurm (but it is possible to exclude).

> xcpuinfo.c:234: more undefined references to
> `hwloc_topology_set_type_filter' follow
The devel package is needed to be installed for hwloc.

Here is an example of how to install it from source (and all of Slurm as an example):
https://gitlab.com/nate20/slurm-docker-scaleout/-/blob/master/scaleout/Dockerfile#L46-48

If you do install hwloc from source, make sure to pass this by configure to tell Slurm where it is:
> --with-hwloc=/usr/local/
Comment 13 Jeff Avila 2021-03-23 11:52:18 MDT
Hi Nate,

hwloc-devel is indeed installed:
[root@slurmctld ~]# rpm -ql hwloc-devel-1.11.2-1.el7.x86_64
/usr/include/hwloc
/usr/include/hwloc.h
/usr/include/hwloc/autogen
/usr/include/hwloc/autogen/config.h
/usr/include/hwloc/bitmap.h
/usr/include/hwloc/cuda.h
/usr/include/hwloc/cudart.h
/usr/include/hwloc/deprecated.h
/usr/include/hwloc/diff.h
/usr/include/hwloc/gl.h
/usr/include/hwloc/glibc-sched.h
/usr/include/hwloc/helper.h
/usr/include/hwloc/inlines.h
/usr/include/hwloc/intel-mic.h
/usr/include/hwloc/linux-libnuma.h
/usr/include/hwloc/linux.h
/usr/include/hwloc/myriexpress.h
/usr/include/hwloc/nvml.h
/usr/include/hwloc/opencl.h
/usr/include/hwloc/openfabrics-verbs.h
/usr/include/hwloc/plugins.h
/usr/include/hwloc/rename.h
/usr/lib64/libhwloc.so
/usr/lib64/pkgconfig/hwloc.pc

..do I need to pass a specific path to the include dir to the configure
script?

THanks,

-Jeff

On Fri, Mar 19, 2021 at 3:31 PM <bugs@schedmd.com> wrote:

> *Comment # 12 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c12> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #11 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c11>)> So, in light of yr. advice, I went back to our original source tarball and
> > tried to rebuild the whole thing in order to get slurmrestd/libslurmfull
> Yes, in general it is not possible to build a single component of Slurm (but it
> is possible to exclude).
> > xcpuinfo.c:234: more undefined references to
> > `hwloc_topology_set_type_filter' follow
> The devel package is needed to be installed for hwloc.
>
> Here is an example of how to install it from source (and all of Slurm as an
> example):https://gitlab.com/nate20/slurm-docker-scaleout/-/blob/master/scaleout/Dockerfile#L46-48
>
> If you do install hwloc from source, make sure to pass this by configure to
> tell Slurm where it is:> --with-hwloc=/usr/local/
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 14 Jeff Avila 2021-03-23 21:31:01 MDT
...just a followup, if I look in the config.log for this build, hwloc is found just fine, i.e.

# grep hwloc config.log
configure:21271: checking for hwloc installation
configure:21305: gcc -o conftest -DNUMA_VERSION1_COMPATIBILITY -g -O2 -std=gnu99 -pthread -I/usr/include    conftest.c -L/usr/lib64 -lhwloc  -lresolv  >&5
x_ac_cv_hwloc_dir=/usr
HWLOC_LIBS='-lhwloc'
#
Comment 15 Nate Rini 2021-03-24 13:04:05 MDT
Please attach your config.log.
Comment 16 Jeff Avila 2021-03-24 13:22:52 MDT
Created attachment 18635 [details]
config.log from build
Comment 17 Nate Rini 2021-03-24 13:44:10 MDT
(In reply to Jeff Avila from comment #16)
> Created attachment 18635 [details]
> config.log from build

Yes, it looks like it found it correctly:
> HWLOC_CPPFLAGS='-I/usr/include'
> HWLOC_LDFLAGS='-Wl,-rpath -Wl,/usr/lib64 -L/usr/lib64'
> HWLOC_LIBS='-lhwloc'
> #define HAVE_HWLOC 1

Is the new compile working?
Comment 18 Jeff Avila 2021-03-24 14:15:00 MDT
No, same error:

../common/libslurmd_common.o: In function `xcpuinfo_hwloc_topo_load':
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:224: undefined
reference to `hwloc_topology_set_type_filter'
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:226: undefined
reference to `hwloc_topology_set_type_filter'
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:228: undefined
reference to `hwloc_topology_set_type_filter'
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:230: undefined
reference to `hwloc_topology_set_type_filter'
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:232: undefined
reference to `hwloc_topology_set_type_filter'
../common/libslurmd_common.o:/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:234:
more undefined references to `hwloc_topology_set_type_filter' follow
collect2: error: ld returned 1 exit status
gmake[4]: *** [slurmd] Error 1
gmake[4]: Leaving directory `/home/gba/slurm-20.02.6/src/slurmd/slurmd'
gmake[3]: *** [all-recursive] Error 1
gmake[3]: Leaving directory `/home/gba/slurm-20.02.6/src/slurmd'
gmake[2]: *** [all-recursive] Error 1
gmake[2]: Leaving directory `/home/gba/slurm-20.02.6/src'
gmake[1]: *** [all-recursive] Error 1
gmake[1]: Leaving directory `/home/gba/slurm-20.02.6'
gmake: *** [all] Error 2

On Wed, Mar 24, 2021 at 3:44 PM <bugs@schedmd.com> wrote:

> *Comment # 17 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c17> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #16 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c16>)> Created attachment 18635 [details] <https://bugs.schedmd.com/attachment.cgi?id=18635> [details] <https://bugs.schedmd.com/attachment.cgi?id=18635&action=edit>
> > config.log from build
>
> Yes, it looks like it found it correctly:> HWLOC_CPPFLAGS='-I/usr/include'
> > HWLOC_LDFLAGS='-Wl,-rpath -Wl,/usr/lib64 -L/usr/lib64'
> > HWLOC_LIBS='-lhwloc'
> > #define HAVE_HWLOC 1
>
> Is the new compile working?
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 19 Nate Rini 2021-03-24 14:26:40 MDT
Looks like hwloc may be too old:

Please call this:
> hwloc-info  --version
> hwloc-info 

Was EPEL the source for hwloc-devel-1.11.2-1.el7.x86_64 ?
Comment 20 Jeff Avila 2021-03-24 14:54:24 MDT
Looks like RH7 extras:

[root@slurmctld slurm-20.02.6]# yum info hwloc-devel.x86_64
Loaded plugins: enabled_repos_upload, langpacks, package_upload,
product-id, search-disabled-repos, subscription-manager
Installed Packages
Name        : hwloc-devel
Arch        : x86_64
Version     : 1.11.2
Release     : 1.el7
Size        : 470 k
Repo        : installed
Summary     : Headers and shared development libraries for hwloc
URL         : http://www.open-mpi.org/projects/hwloc/
License     : BSD
Description : Headers and shared object symbolic links for the hwloc.

Available Packages
Name        : hwloc-devel
Arch        : x86_64
Version     : 1.11.8
Release     : 4.el7
Size        : 208 k
Repo        : rhel-7-server-optional-rpms/7Server/x86_64
Summary     : Headers and shared development libraries for hwloc
URL         : http://www.open-mpi.org/projects/hwloc/
License     : BSD
Description : Headers and shared object symbolic links for the hwloc.

Uploading Enabled Repositories Report
Loaded plugins: langpacks, product-id
[root@slurmctld slurm-20.02.6]#
[root@slurmctld slurm-20.02.6]# hwloc-info --version
hwloc-info 2.3.0
[root@slurmctld slurm-20.02.6]# hwloc-info
depth 0:           1 Machine (type #0)
 depth 1:          8 Package (type #1)
  depth 2:         8 L2Cache (type #5)
   depth 3:        8 L1dCache (type #4)
    depth 4:       8 L1iCache (type #9)
     depth 5:      8 Core (type #2)
      depth 6:     8 PU (type #3)
Special depth -3:  1 NUMANode (type #13)
Special depth -4:  1 Bridge (type #14)
Special depth -5:  4 PCIDev (type #15)
Special depth -6:  3 OSDev (type #16)
[root@slurmctld slurm-20.02.6]#

On Wed, Mar 24, 2021 at 4:26 PM <bugs@schedmd.com> wrote:

> *Comment # 19 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c19> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> Looks like hwloc may be too old:
>
> Please call this:> hwloc-info  --version
> > hwloc-info
>
> Was EPEL the source for hwloc-devel-1.11.2-1.el7.x86_64 ?
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 21 Nate Rini 2021-03-24 15:16:21 MDT
(In reply to Jeff Avila from comment #20)
> [root@slurmctld slurm-20.02.6]# yum info hwloc-devel.x86_64
> Version     : 1.11.2
>
> [root@slurmctld slurm-20.02.6]# hwloc-info --version
> hwloc-info 2.3.0

Looks like there are 2 different hwloc installs (2.3.0 and 1.11.1) that are competing with each other.

Please call this:
>  ldd $(which slurmd) 
Please make sure it points to the newly compiled slurmd.
Comment 22 Jeff Avila 2021-03-24 16:51:36 MDT
Here's the ldd for the latest-successful compilation of slurmd, the one we have in production:

[root@slurmctld slurm-20.02.6]# ldd /usr/local/sbin/slurmd
	linux-vdso.so.1 =>  (0x00007ffc14df5000)
	libz.so.1 => /lib64/libz.so.1 (0x00007fcacca28000)
	liblz4.so.1 => /lib64/liblz4.so.1 (0x00007fcacc813000)
	libslurmfull.so => /usr/local/lib64/libslurmfull.so (0x00007fcacc404000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fcacc200000)
	libnuma.so.1 => /lib64/libnuma.so.1 (0x00007fcacbff4000)
	libhwloc.so.15 => /usr/local/lib64/libhwloc.so.15 (0x00007fcacbda1000)
	libpam.so.0 => /lib64/libpam.so.0 (0x00007fcacbb92000)
	libpam_misc.so.0 => /lib64/libpam_misc.so.0 (0x00007fcacb98e000)
	libutil.so.1 => /lib64/libutil.so.1 (0x00007fcacb78b000)
	libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fcacb571000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fcacb355000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fcacaf94000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fcaccc3e000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fcacad7e000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fcacaa7c000)
	libaudit.so.1 => /lib64/libaudit.so.1 (0x00007fcaca854000)
	libcap-ng.so.0 => /lib64/libcap-ng.so.0 (0x00007fcac
Comment 23 Nate Rini 2021-03-24 22:02:18 MDT
Please call the following:
> nm /usr/local/lib64/libhwloc.so.15
> rpm -q --whatprovides /usr/local/lib64/libhwloc.so.15
Comment 24 Jeff Avila 2021-03-25 09:40:40 MDT
Interesting....

# rpm -q --whatprovides /usr/local/lib64/libhwloc.so.15
file /usr/local/lib64/libhwloc.so.15 is not owned by any package

(nm output is attached)
Comment 25 Jeff Avila 2021-03-25 09:41:10 MDT
Created attachment 18651 [details]
nm output
Comment 26 Nate Rini 2021-03-25 09:43:55 MDT
(In reply to Jeff Avila from comment #24)
> Interesting....
> 
> # rpm -q --whatprovides /usr/local/lib64/libhwloc.so.15
> file /usr/local/lib64/libhwloc.so.15 is not owned by any package

Please rename that file and try the ldd test from comment#22 again:
> mv /usr/local/lib64/libhwloc.so.15 /usr/local/lib64/.DISABLED.libhwloc.so.15
> ldd /usr/local/sbin/slurmd
Comment 27 Jeff Avila 2021-03-25 09:54:51 MDT
Here we go:

[root@slurmctld slurm-20.02.6]# ldd /usr/local/sbin/slurmd
linux-vdso.so.1 =>  (0x00007ffdc75f8000)
libz.so.1 => /usr/lib64/libz.so.1 (0x00007f5695d80000)
liblz4.so.1 => /usr/lib64/liblz4.so.1 (0x00007f5695b6b000)
libslurmfull.so => /usr/local/lib64/libslurmfull.so (0x00007f569575c000)
libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007f5695558000)
libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f569534c000)
libhwloc.so.15 => not found
libpam.so.0 => /usr/lib64/libpam.so.0 (0x00007f569513d000)
libpam_misc.so.0 => /usr/lib64/libpam_misc.so.0 (0x00007f5694f39000)
libutil.so.1 => /usr/lib64/libutil.so.1 (0x00007f5694d36000)
libresolv.so.2 => /usr/lib64/libresolv.so.2 (0x00007f5694b1c000)
libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007f5694900000)
libc.so.6 => /usr/lib64/libc.so.6 (0x00007f569453f000)
/lib64/ld-linux-x86-64.so.2 (0x00007f5695f96000)
libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007f5694329000)
libaudit.so.1 => /usr/lib64/libaudit.so.1 (0x00007f5694101000)
libcap-ng.so.0 => /usr/lib64/libcap-ng.so.0 (0x00007f5693efb000)
[root@slurmctld slurm-20.02.6]#

On Thu, Mar 25, 2021 at 11:43 AM <bugs@schedmd.com> wrote:

> *Comment # 26 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c26> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #24 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c24>)> Interesting....
> >
> > # rpm -q --whatprovides /usr/local/lib64/libhwloc.so.15
> > file /usr/local/lib64/libhwloc.so.15 is not owned by any package
>
> Please rename that file and try the ldd test from comment#22 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c22> again:> mv /usr/local/lib64/libhwloc.so.15 /usr/local/lib64/.DISABLED.libhwloc.so.15
> > ldd /usr/local/sbin/slurmd
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 28 Nate Rini 2021-03-25 10:24:02 MDT
(In reply to Jeff Avila from comment #27)
> libhwloc.so.15 => not found

Please reconfigure and 'make install' Slurm (from source) now and try again.
Comment 29 Jeff Avila 2021-03-25 10:40:42 MDT
./configure now fails, have attached the output....
Comment 30 Jeff Avila 2021-03-25 10:41:07 MDT
Created attachment 18655 [details]
./configure output
Comment 31 Nate Rini 2021-03-25 10:43:51 MDT
Please attach the config.log that is generated too.
Comment 32 Jeff Avila 2021-03-25 12:15:07 MDT
config.log for latest attempt is attached.

Nate-if you think it would be any use; I can setup a Zoom call easily and we can do this interactively if you think that would expedite a solution.
Comment 33 Jeff Avila 2021-03-25 12:15:33 MDT
Created attachment 18667 [details]
latest config.log
Comment 34 Nate Rini 2021-03-25 12:52:13 MDT
(In reply to Jeff Avila from comment #33)
> Created attachment 18667 [details]
> latest config.log
>
> /usr/bin/ld: warning: libhwloc.so.15, needed by /usr/local/lib/libpmix.so, not found (try using -rpath or -rpath-link)
> /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_set_flags'
> /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_set_xml'
> /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_export_xmlbuffer'
> /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_set_xmlbuffer'
> /usr/local/lib/libpmix.so: undefined reference to `hwloc_shmem_topology_write'
> /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_load'
> /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_destroy'
> /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_set_io_types_filter'
> /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_init'
> /usr/local/lib/libpmix.so: undefined reference to `hwloc_shmem_topology_get_length'
> /usr/local/lib/libpmix.so: undefined reference to `hwloc_free_xmlbuffer'

It is still looking for the wrong one. Lets see if we missing something:
> ls -la /usr/local/lib64/libhwloc*
Comment 35 Jeff Avila 2021-03-25 13:46:30 MDT
[root@slurmctld slurm-20.02.6]# ls -la /usr/local/lib64/libhwloc*
-rwxr-xr-x 1 root root     921 Nov 16 10:27 /usr/local/lib64/libhwloc.la
lrwxrwxrwx 1 root root      18 Nov 16 10:27 /usr/local/lib64/libhwloc.so ->
libhwloc.so.15.3.0
-rwxr-xr-x 1 root root 1589480 Nov 16 10:27
/usr/local/lib64/libhwloc.so.15.3.0

On Thu, Mar 25, 2021 at 2:52 PM <bugs@schedmd.com> wrote:

> *Comment # 34 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c34> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #33 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c33>)> Created attachment 18667 [details] <https://bugs.schedmd.com/attachment.cgi?id=18667> [details] <https://bugs.schedmd.com/attachment.cgi?id=18667&action=edit>
> > latest config.log
> >> /usr/bin/ld: warning: libhwloc.so.15, needed by /usr/local/lib/libpmix.so, not found (try using -rpath or -rpath-link)
> > /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_set_flags'
> > /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_set_xml'
> > /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_export_xmlbuffer'
> > /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_set_xmlbuffer'
> > /usr/local/lib/libpmix.so: undefined reference to `hwloc_shmem_topology_write'
> > /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_load'
> > /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_destroy'
> > /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_set_io_types_filter'
> > /usr/local/lib/libpmix.so: undefined reference to `hwloc_topology_init'
> > /usr/local/lib/libpmix.so: undefined reference to `hwloc_shmem_topology_get_length'
> > /usr/local/lib/libpmix.so: undefined reference to `hwloc_free_xmlbuffer'
>
> It is still looking for the wrong one. Lets see if we missing something:> ls -la /usr/local/lib64/libhwloc*
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 36 Nate Rini 2021-03-25 13:49:27 MDT
(In reply to Jeff Avila from comment #35)
> [root@slurmctld slurm-20.02.6]# ls -la /usr/local/lib64/libhwloc*
> -rwxr-xr-x 1 root root     921 Nov 16 10:27 /usr/local/lib64/libhwloc.la
> lrwxrwxrwx 1 root root      18 Nov 16 10:27 /usr/local/lib64/libhwloc.so ->
> libhwloc.so.15.3.0
> -rwxr-xr-x 1 root root 1589480 Nov 16 10:27
> /usr/local/lib64/libhwloc.so.15.3.0

Please move all of those out of the way and recompile again.
Comment 37 Jeff Avila 2021-03-25 14:13:10 MDT
Created attachment 18668 [details]
latest config.log output
Comment 38 Jeff Avila 2021-03-25 14:13:50 MDT
./configure didn't complete; config.log.latest2 is attached.
Comment 39 Nate Rini 2021-03-26 09:49:31 MDT
Please note that the severity levels are strictly defined here:
> https://www.schedmd.com/support.php
I'm going to change this to SEV4 as this is a question about installing a new feature and not an existing service that is degraded. Please note that increasing the SEV levels will not automatically result in a faster response time.

> Severity 4 — Minor Issues
> A Severity 4 issue is a minor issue with limited or no loss in functionality within the customer environment. Severity 4 issues may also be used for recommendations for future product enhancements or modifications.

Also, I got your email and if your site has consulting time, I would be happy to work with Jess to get a call setup.

(In reply to Jeff Avila from comment #38)
> ./configure didn't complete; config.log.latest2 is attached.
>
> /usr/bin/ld: warning: libhwloc.so.15, needed by /usr/local/lib/libpmix.so, not found (try using -rpath or -rpath-link)

Lets change around the configure command to point to the other hwloc:
> ./configure --prefix=/home/gba/slurm20
to
> export PKG_CONFIG_PATH=/usr/lib64/pkgconfig/:$PKG_CONFIG_PATH
> ./configure --prefix=/home/gba/slurm20

I also suggest re-installing the hwloc-devel-1.11.2-1.el7.x86_64 rpm before calling attempting a recompile.
Comment 40 Jeff Avila 2021-03-26 10:03:24 MDT
Hi Nate,

I'm ready to have a zoom call at your earliest convenience.

I've set the pkg_config_path variable:

[root@slurmctld slurm-20.02.6]# echo $PKG_CONFIG_PATH
/usr/lib64/pkgconfig/:
[root@slurmctld slurm-20.02.6]#

...but the configure fails at the same place as before.

I'll upload the config.log presently.

Thanks!

-Jeff

On Fri, Mar 26, 2021 at 11:49 AM <bugs@schedmd.com> wrote:

> *Comment # 39 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c39> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> Please note that the severity levels are strictly defined here:> https://www.schedmd.com/support.php
> I'm going to change this to SEV4 as this is a question about installing a new
> feature and not an existing service that is degraded. Please note that
> increasing the SEV levels will not automatically result in a faster response
> time.
> > Severity 4 — Minor Issues
> > A Severity 4 issue is a minor issue with limited or no loss in functionality within the customer environment. Severity 4 issues may also be used for recommendations for future product enhancements or modifications.
>
> Also, I got your email and if your site has consulting time, I would be happy
> to work with Jess to get a call setup.
>
> (In reply to Jeff Avila from comment #38 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c38>)> ./configure didn't complete; config.log.latest2 is attached.
> >> /usr/bin/ld: warning: libhwloc.so.15, needed by /usr/local/lib/libpmix.so, not found (try using -rpath or -rpath-link)
>
> Lets change around the configure command to point to the other hwloc:> ./configure --prefix=/home/gba/slurm20
> to> export PKG_CONFIG_PATH=/usr/lib64/pkgconfig/:$PKG_CONFIG_PATH
> > ./configure --prefix=/home/gba/slurm20
>
> I also suggest re-installing the hwloc-devel-1.11.2-1.el7.x86_64 rpm before
> calling attempting a recompile.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 41 Jeff Avila 2021-03-26 10:04:27 MDT
Created attachment 18695 [details]
config.log build attempt
Comment 42 Nate Rini 2021-03-26 10:30:59 MDT
(In reply to Nate Rini from comment #39)
> I also suggest re-installing the hwloc-devel-1.11.2-1.el7.x86_64 rpm before
> calling attempting a recompile.

Was this done?
Comment 43 Jeff Avila 2021-03-26 10:31:35 MDT
Yes, I did.

On Fri, Mar 26, 2021 at 12:31 PM <bugs@schedmd.com> wrote:

> *Comment # 42 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c42> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Nate Rini from comment #39 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c39>)> I also suggest re-installing the hwloc-devel-1.11.2-1.el7.x86_64 rpm before
> > calling attempting a recompile.
>
> Was this done?
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 45 Nate Rini 2021-03-26 10:38:05 MDT
(In reply to Jeff Avila from comment #43)
> Yes, I did.

Looks like hwloc is now correct but pmix (install) is now the issue:
> HWLOC_CPPFLAGS='-I/usr/include'
> HWLOC_LDFLAGS='-Wl,-rpath -Wl,/usr/lib64 -L/usr/lib64'
> HWLOC_LIBS='-lhwloc'
>
> /usr/bin/ld: warning: libhwloc.so.15, needed by /usr/local/lib64/libpmix.so, not found (try using -rpath or -rpath-link)
You will now need to recompile pmix now against the correct hwloc.
Comment 46 Jeff Avila 2021-03-26 11:26:30 MDT
Neither pmix-3.1.5 nor pmix-3.2.1 build successfully; both configure
properly and then the build ends the same way:

make[2]: Entering directory `/home/gba/pmix-3.1.5/src/tools/pevent'
  CC       pevent.o
  CCLD     pevent
../../../src/.libs/libpmix.so: undefined reference to
`hwloc_shmem_topology_write'
../../../src/.libs/libpmix.so: undefined reference to
`hwloc_shmem_topology_get_length'
../../../src/.libs/libpmix.so: undefined reference to
`hwloc_topology_set_io_types_filter'
collect2: error: ld returned 1 exit status
make[2]: *** [pevent] Error 1
make[2]: Leaving directory `/home/gba/pmix-3.1.5/src/tools/pevent'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/gba/pmix-3.1.5/src'
make: *** [all-recursive] Error 1
[root@slurmctld pmix-3.1.5]#



On Fri, Mar 26, 2021 at 12:38 PM <bugs@schedmd.com> wrote:

> *Comment # 45 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c45> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #43 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c43>)> Yes, I did.
>
> Looks like hwloc is now correct but pmix (install) is now the issue:> HWLOC_CPPFLAGS='-I/usr/include'
> > HWLOC_LDFLAGS='-Wl,-rpath -Wl,/usr/lib64 -L/usr/lib64'
> > HWLOC_LIBS='-lhwloc'
> >> /usr/bin/ld: warning: libhwloc.so.15, needed by /usr/local/lib64/libpmix.so, not found (try using -rpath or -rpath-link)
> You will now need to recompile pmix now against the correct hwloc.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 47 Jeff Avila 2021-03-26 12:14:01 MDT
More info:

After neither pmix-3.2.1 nor pmix-3.1.5 built from the tarball I had; I followed the instructions at https://slurm.schedmd.com/mpi_guide.html#pmix; and built pmix-2.1 from the git repo. This version of pmix built! Unfortunately, slurm-20.0.6 still fails to build using the following configure cli:

[root@slurmctld slurm-20.02.6]# ./configure --prefix=/home/gba/slurm20 --with-pmix=/home/user/gba/pmix/install/2.1

This configures properly, and some time after running "make", I get the following:

../common/libslurmd_common.o: In function `xcpuinfo_hwloc_topo_load':
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:224: undefined reference to `hwloc_topology_set_type_filter'
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:226: undefined reference to `hwloc_topology_set_type_filter'
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:228: undefined reference to `hwloc_topology_set_type_filter'
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:230: undefined reference to `hwloc_topology_set_type_filter'
/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:232: undefined reference to `hwloc_topology_set_type_filter'
../common/libslurmd_common.o:/home/gba/slurm-20.02.6/src/slurmd/common/xcpuinfo.c:234: more undefined references to `hwloc_topology_set_type_filter' follow
collect2: error: ld returned 1 exit status
make[4]: *** [slurmd] Error 1
make[4]: Leaving directory `/home/gba/slurm-20.02.6/src/slurmd/slurmd'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory `/home/gba/slurm-20.02.6/src/slurmd'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/gba/slurm-20.02.6/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/gba/slurm-20.02.6'
make: *** [all] Error 2
Comment 48 Jeff Avila 2021-03-26 12:49:39 MDT
Ok, after building hwloc-2.3 from source, pointing the slurm config at both that and the new pmix-2.1, I got slurm to build. At this point, I can run slurmrestd out of the new build directory like so:

[root@slurmctld sbin]# ./slurmrestd -vvv -f /usr/local/etc/slurm.conf localhost:10071
slurmrestd: debug2: _establish_config_source: using config_file=/usr/local/etc/slurm.conf (provided)
slurmrestd: debug:  slurm_conf_init: using config_file=/usr/local/etc/slurm.conf
slurmrestd: debug:  Reading slurm.conf file: /usr/local/etc/slurm.conf
slurmrestd: debug:  Ignoring obsolete CacheGroups option.
slurmrestd: debug:  Ignoring obsolete SchedulerPort option.
slurmrestd: debug:  Interactive mode activated (TTY detected on STDIN)
slurmrestd: debug:  main: server listen mode activated
slurmrestd: debug:  Munge authentication plugin loaded
slurmrestd: debug:  parse_http: [localhost:39284] Accepted HTTP connection
slurmrestd: error: parse_http: [localhost:39284] unexpected HTTP error HPE_INVALID_METHOD: invalid HTTP method
slurmrestd: error: _wrap_on_data: [localhost:39284] on_data returned rc: Unexpected message received
slurmrestd: debug:  parse_http: [localhost:39838] Accepted HTTP connection
slurmrestd: error: parse_http: [localhost:39838] unexpected HTTP error HPE_INVALID_METHOD: invalid HTTP method
slurmrestd: error: _wrap_on_data: [localhost:39838] on_data returned rc: Unexpected message received

...I've tried poking at it with curl, but I am not a web developer, so I'm at a loss to see how to check functionality here...any ideas?

Thanks,

-Jeff
Comment 49 Nate Rini 2021-03-26 13:30:15 MDT
(In reply to Jeff Avila from comment #48)
> Ok, after building hwloc-2.3 from source, pointing the slurm config at both
> that and the new pmix-2.1, I got slurm to build. At this point, I can run
> slurmrestd out of the new build directory like so:

Great, I was just about to send the meeting invite as I just finished another meeting.

> ...I've tried poking at it with curl, but I am not a web developer, so I'm
> at a loss to see how to check functionality here...any ideas?
Please take a look at this presentation for examples:
> https://slurm.schedmd.com/SLUG20/REST_API.pdf

Note that if you're using munge authentication for slurmrestd, you will need to use a UNIX socket (denoted with unix:) instead of a TCP socket.
Comment 50 Jeff Avila 2021-03-29 10:43:58 MDT
Thanks Nate,

So, if we're using a unix domain socket, how do we connect web clients to
slurmrestd? Is there some way to do that via inetd?

-Jeff

On Fri, Mar 26, 2021 at 3:30 PM <bugs@schedmd.com> wrote:

> *Comment # 49 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c49> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #48 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c48>)> Ok, after building hwloc-2.3 from source, pointing the slurm config at both
> > that and the new pmix-2.1, I got slurm to build. At this point, I can run
> > slurmrestd out of the new build directory like so:
>
> Great, I was just about to send the meeting invite as I just finished another
> meeting.
> > ...I've tried poking at it with curl, but I am not a web developer, so I'm
> > at a loss to see how to check functionality here...any ideas?
> Please take a look at this presentation for examples:> https://slurm.schedmd.com/SLUG20/REST_API.pdf
>
> Note that if you're using munge authentication for slurmrestd, you will need to
> use a UNIX socket (denoted with unix:) instead of a TCP socket.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 51 Nate Rini 2021-03-29 11:05:57 MDT
(In reply to Jeff Avila from comment #50)
> So, if we're using a unix domain socket, how do we connect web clients to
> slurmrestd?
You can't with local authentication. The Linux kernel only supports doing authentication of unix sockets directly, so another authentication method will be required.

> Is there some way to do that via inetd?

You will need to activate "JSON Web Token (JWT) Authentication":
> https://slurm.schedmd.com/rest.html
> https://slurm.schedmd.com/jwt.html

Please follow the docs above and see comment we missed anything.
Comment 52 Jeff Avila 2021-03-29 12:27:09 MDT
Ok,

I've rebuilt slurmrestd with YAML and JWT support according to the
instructions. I can run an ldd on the executable:
[root@slurmctld sbin]# ldd slurmrestd
linux-vdso.so.1 =>  (0x00007ffe0c7a9000)
libslurmfull.so => /home/gba/slurm20/lib/slurm/libslurmfull.so
(0x00007fdb62ade000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fdb628da000)
libhttp_parser.so.2 => /lib64/libhttp_parser.so.2 (0x00007fdb626d2000)
libyaml-0.so.2 => /lib64/libyaml-0.so.2 (0x00007fdb624b2000)
libjson-c.so.2 => /lib64/libjson-c.so.2 (0x00007fdb622a7000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fdb6208d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdb61e71000)
libc.so.6 => /lib64/libc.so.6 (0x00007fdb61ab0000)
/lib64/ld-linux-x86-64.so.2 (0x00007fdb62ed7000)
[root@slurmctld sbin]#

I don't see a jwt library there,is that being loaded dynamically at
startup? How do I verify JWT support is working?

According to the instructions at :

 https://slurm.schedmd.com/jwt.html

...we need to put a system-wide key in the state-save space, have it
owned by the slurm user, and then manually create tokens for each
user.

How do we get the tokens to the users for their user-agents?

Is this communicated to the users out-of-band? I guess I'm not sure
how this JWT method is supposed to work...




On Mon, Mar 29, 2021 at 1:05 PM <bugs@schedmd.com> wrote:

> *Comment # 51 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c51> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #50 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c50>)> So, if we're using a unix domain socket, how do we connect web clients to
> > slurmrestd?
> You can't with local authentication. The Linux kernel only supports doing
> authentication of unix sockets directly, so another authentication method will
> be required.
> > Is there some way to do that via inetd?
>
> You will need to activate "JSON Web Token (JWT) Authentication":> https://slurm.schedmd.com/rest.html
> > https://slurm.schedmd.com/jwt.html
>
> Please follow the docs above and see comment we missed anything.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 53 Nate Rini 2021-03-29 12:40:29 MDT
(In reply to Jeff Avila from comment #52)
> I've rebuilt slurmrestd with YAML and JWT support according to the
> instructions. I can run an ldd on the executable:
> I don't see a jwt library there,is that being loaded dynamically at
> startup? How do I verify JWT support is working?
ldd will not find it as libjwt is loaded at runtime. 

Try this instead:
> pgrep slurmrestd | xargs -i grep -i jwt /proc/{}/maps
 

> How do we get the tokens to the users for their user-agents?
> Is this communicated to the users out-of-band? I guess I'm not sure
> how this JWT method is supposed to work...

Users can call 'scontrol token' directly if they have access to the cluster. If users do not have direct access, a separate mechanism outside of Slurm will be required.

Please note that an authenticating proxy is also an option (on top of auth/JWT) to allow a site to use their existing single sign-on system to avoid users needing to be given JWT out of band.

A (trivial) example is provided here:
> https://gitlab.com/SchedMD/training/docker-scale-out/-/tree/master/proxy

Sites can also directly generate JWT as they are based on RFC7519. We provide an example here which on the next release will be on normal documentation link for JWT:
> https://github.com/SchedMD/slurm/commit/c9e5ed775c2b5c1428f51844583fe77bd7aae3e7
Comment 54 Jeff Avila 2021-03-29 12:55:01 MDT
This does not seem to be loading; does it need a separate cli flag?

[root@slurmctld sbin]# ps -aef | grep slurmrestd
root      2570 17994  0 14:50 pts/11   00:00:00 ./slurmrestd -f
/usr/local/etc/slurm.conf localhost:10011
root      4912 17994  0 14:53 pts/11   00:00:00 grep --color=auto slurmrestd
[root@slurmctld sbin]# cat /proc/2570/maps | grep jwt
[root@slurmctld sbin]#



On Mon, Mar 29, 2021 at 2:40 PM <bugs@schedmd.com> wrote:

> *Comment # 53 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c53> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #52 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c52>)> I've rebuilt slurmrestd with YAML and JWT support according to the
> > instructions. I can run an ldd on the executable:
> > I don't see a jwt library there,is that being loaded dynamically at
> > startup? How do I verify JWT support is working?
> ldd will not find it as libjwt is loaded at runtime.
>
> Try this instead:> pgrep slurmrestd | xargs -i grep -i jwt /proc/{}/maps
>
> > How do we get the tokens to the users for their user-agents?
> > Is this communicated to the users out-of-band? I guess I'm not sure
> > how this JWT method is supposed to work...
>
> Users can call 'scontrol token' directly if they have access to the cluster. If
> users do not have direct access, a separate mechanism outside of Slurm will be
> required.
>
> Please note that an authenticating proxy is also an option (on top of auth/JWT)
> to allow a site to use their existing single sign-on system to avoid users
> needing to be given JWT out of band.
>
> A (trivial) example is provided here:> https://gitlab.com/SchedMD/training/docker-scale-out/-/tree/master/proxy
>
> Sites can also directly generate JWT as they are based on RFC7519. We provide
> an example here which on the next release will be on normal documentation link
> for JWT:> https://github.com/SchedMD/slurm/commit/c9e5ed775c2b5c1428f51844583fe77bd7aae3e7
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 55 Nate Rini 2021-03-29 12:57:01 MDT
(In reply to Jeff Avila from comment #54)
> This does not seem to be loading; does it need a separate cli flag?

Call this:
> scontrol show config

Try this:
> slurmrestd -a jwt
Comment 56 Jeff Avila 2021-03-29 12:59:25 MDT
I should add that I haven't made the changes in slurm.conf and restarted
slurmctld yet; if that makes a difference...here's the output:

[root@slurmctld sbin]# scontrol show config
Configuration data as of 2021-03-29T14:57:24
AccountingStorageBackupHost = (null)
AccountingStorageEnforce = associations,limits,qos
AccountingStorageHost   = slurmctld
AccountingStorageLoc    = N/A
AccountingStoragePort   = 6819
AccountingStorageTRES   =
cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu
AccountingStorageType   = accounting_storage/slurmdbd
AccountingStorageUser   = N/A
AccountingStoreJobComment = Yes
AcctGatherEnergyType    = acct_gather_energy/none
AcctGatherFilesystemType = acct_gather_filesystem/none
AcctGatherInterconnectType = acct_gather_interconnect/none
AcctGatherNodeFreq      = 0 sec
AcctGatherProfileType   = acct_gather_profile/none
AllowSpecResourcesUsage = No
AuthAltTypes            = (null)
AuthInfo                = (null)
AuthType                = auth/munge
BatchStartTimeout       = 120 sec
BOOT_TIME               = 2021-03-29T11:37:02
BurstBufferType         = (null)
CliFilterPlugins        = (null)
ClusterName             = slurmctld
CommunicationParameters = (null)
CompleteWait            = 0 sec
CoreSpecPlugin          = core_spec/none
CpuFreqDef              = Unknown
CpuFreqGovernors        = Performance,OnDemand,UserSpace
CredType                = cred/munge
DebugFlags              = NO_CONF_HASH
DefMemPerCPU            = 2800
DependencyParameters    = (null)
DisableRootJobs         = No
EioTimeout              = 60
EnforcePartLimits       = NO
Epilog                  = /usr/local/etc/slurm/epilog
EpilogMsgTime           = 2000 usec
EpilogSlurmctld         = (null)
ExtSensorsType          = ext_sensors/none
ExtSensorsFreq          = 0 sec
FairShareDampeningFactor = 1
FederationParameters    = (null)
FirstJobId              = 1
GetEnvTimeout           = 2 sec
GresTypes               = gpu
GpuFreqDef              = high,memory=high
GroupUpdateForce        = 1
GroupUpdateTime         = 600 sec
HASH_VAL                = Match
HealthCheckInterval     = 0 sec
HealthCheckNodeState    = ANY
HealthCheckProgram      = (null)
InactiveLimit           = 0 sec
JobAcctGatherFrequency  = 30
JobAcctGatherType       = jobacct_gather/linux
JobAcctGatherParams     = (null)
JobCompHost             = slurmctld
JobCompLoc              = /var/log/slurm_jobcomp.log
JobCompPort             = 0
JobCompType             = jobcomp/none
JobCompUser             = root
JobContainerType        = job_container/none
JobCredentialPrivateKey = (null)
JobCredentialPublicCertificate = (null)
JobDefaults             = (null)
JobFileAppend           = 0
JobRequeue              = 0
JobSubmitPlugins        = (null)
KeepAliveTime           = SYSTEM_DEFAULT
KillOnBadExit           = 0
KillWait                = 60 sec
LaunchParameters        = (null)
LaunchType              = launch/slurm
Layouts                 =
Licenses                = (null)
LogTimeFormat           = iso8601_ms
MailDomain              = (null)
MailProg                = /bin/mail
MaxArraySize            = 10001
MaxDBDMsgs              = 132842
MaxJobCount             = 65535
MaxJobId                = 67043328
MaxMemPerNode           = UNLIMITED
MaxStepCount            = 40000
MaxTasksPerNode         = 512
MCSPlugin               = mcs/none
MCSParameters           = (null)
MessageTimeout          = 60 sec
MinJobAge               = 10 sec
MpiDefault              = none
MpiParams               = (null)
MsgAggregationParams    = (null)
NEXT_JOB_ID             = 948917
NodeFeaturesPlugins     = (null)
OverTimeLimit           = 0 min
PluginDir               = /usr/local/lib/slurm
PlugStackConfig         = (null)
PowerParameters         = (null)
PowerPlugin             =
PreemptMode             = OFF
PreemptType             = preempt/none
PreemptExemptTime       = 00:00:00
PrEpParameters          = (null)
PrEpPlugins             = prep/script
PriorityParameters      = (null)
PrioritySiteFactorParameters = (null)
PrioritySiteFactorPlugin = (null)
PriorityDecayHalfLife   = 7-00:00:00
PriorityCalcPeriod      = 00:05:00
PriorityFavorSmall      = No
PriorityFlags           =
PriorityMaxAge          = 3-00:00:00
PriorityUsageResetPeriod = NONE
PriorityType            = priority/multifactor
PriorityWeightAge       = 1
PriorityWeightAssoc     = 0
PriorityWeightFairShare = 8000
PriorityWeightJobSize   = 1
PriorityWeightPartition = 1000
PriorityWeightQOS       = 40000
PriorityWeightTRES      = (null)
PrivateData             = none
ProctrackType           = proctrack/cgroup
Prolog                  = /usr/local/etc/slurm/prolog
PrologEpilogTimeout     = 65534
PrologSlurmctld         = /usr/local/etc/slurm/controller_prolog
PrologFlags             = Alloc,Contain,X11
PropagatePrioProcess    = 0
PropagateResourceLimits = ALL
PropagateResourceLimitsExcept = (null)
RebootProgram           = /sbin/reboot
ReconfigFlags           = (null)
RequeueExit             = (null)
RequeueExitHold         = (null)
ResumeFailProgram       = (null)
ResumeProgram           = (null)
ResumeRate              = 300 nodes/min
ResumeTimeout           = 60 sec
ResvEpilog              = (null)
ResvOverRun             = 0 min
ResvProlog              = (null)
ReturnToService         = 2
RoutePlugin             = route/default
SallocDefaultCommand    = (null)
SbcastParameters        = (null)
SchedulerParameters     =
defer,bf_max_job_assoc=10,bf_max_job_test=100,bf_continue,max_array_tasks=10001,sched_min_interval=1000,bf_interval=120,bf_max_job_array_resv=2
SchedulerTimeSlice      = 30 sec
SchedulerType           = sched/backfill
SelectType              = select/cons_res
SelectTypeParameters    = CR_CORE_MEMORY,CR_ONE_TASK_PER_CORE
SlurmUser               = slurm(508)
SlurmctldAddr           = (null)
SlurmctldDebug          = error
SlurmctldHost[0]        = slurmctld
SlurmctldLogFile        = /var/log/slurm/slurmctld
SlurmctldPort           = 6810-6817
SlurmctldSyslogDebug    = unknown
SlurmctldPrimaryOffProg = (null)
SlurmctldPrimaryOnProg  = (null)
SlurmctldTimeout        = 420 sec
SlurmctldParameters     = (null)
SlurmdDebug             = error
SlurmdLogFile           = /var/log/slurm/slurmd
SlurmdParameters        = (null)
SlurmdPidFile           = /var/run/slurmd.pid
SlurmdPort              = 6818
SlurmdSpoolDir          = /var/spool/slurmd
SlurmdSyslogDebug       = unknown
SlurmdTimeout           = 600 sec
SlurmdUser              = root(0)
SlurmSchedLogFile       = (null)
SlurmSchedLogLevel      = 0
SlurmctldPidFile        = /var/run/slurmctld.pid
SlurmctldPlugstack      = (null)
SLURM_CONF              = /usr/local/etc/slurm.conf
SLURM_VERSION           = 20.02.6
SrunEpilog              = (null)
SrunPortRange           = 0-0
SrunProlog              = (null)
StateSaveLocation       = /var/spool/slurmctld
SuspendExcNodes         = (null)
SuspendExcParts         = (null)
SuspendProgram          = (null)
SuspendRate             = 60 nodes/min
SuspendTime             = NONE
SuspendTimeout          = 30 sec
SwitchType              = switch/none
TaskEpilog              = (null)
TaskPlugin              = task/cgroup
TaskPluginParam         = (null type)
TaskProlog              = /usr/local/etc/slurm/task_prolog
TCPTimeout              = 2 sec
TmpFS                   = /tmp
TopologyParam           = (null)
TopologyPlugin          = topology/tree
TrackWCKey              = No
TreeWidth               = 50
UsePam                  = No
UnkillableStepProgram   = (null)
UnkillableStepTimeout   = 300 sec
VSizeFactor             = 0 percent
WaitTime                = 0 sec
X11Parameters           = (null)

Cgroup Support Configuration:
AllowedDevicesFile      = /usr/local/etc/cgroup_allowed_devices_file.conf
AllowedKmemSpace        = (null)
AllowedRAMSpace         = 100.0%
AllowedSwapSpace        = 0.0%
CgroupAutomount         = yes
CgroupMountpoint        = /sys/fs/cgroup
ConstrainCores          = yes
ConstrainDevices        = yes
ConstrainKmemSpace      = no
ConstrainRAMSpace       = yes
ConstrainSwapSpace      = no
MaxKmemPercent          = 100.0%
MaxRAMPercent           = 100.0%
MaxSwapPercent          = 100.0%
MemorySwappiness        = (null)
MinKmemSpace            = 30 MB
MinRAMSpace             = 30 MB
TaskAffinity            = no

Slurmctld(primary) at slurmctld is UP
[root@slurmctld sbin]# ./slurmrestd -a jwt
Usage: slurmrestd [OPTIONS] [host:port]...
-f file
Use specified file for slurmctld configuration
-h
Print this help message.
-t <thread count>
Number of threads to use for processing.
-u <user>
setuid() to user after opening sockets.
-v
Verbose mode. Multiple -v's increase verbosity.
-V
Print version information and exit.
[root@slurmctld sbin]#

On Mon, Mar 29, 2021 at 2:57 PM <bugs@schedmd.com> wrote:

> *Comment # 55 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c55> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #54 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c54>)> This does not seem to be loading; does it need a separate cli flag?
>
> Call this:> scontrol show config
>
> Try this:> slurmrestd -a jwt
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 57 Nate Rini 2021-03-29 13:02:26 MDT
(In reply to Jeff Avila from comment #56)
> I should add that I haven't made the changes in slurm.conf and restarted
> slurmctld yet; if that makes a difference...here's the output:
Yes, it does. Note that JWT is being added as a secondary auth and does not replace munge.

> [root@slurmctld sbin]# ./slurmrestd -a jwt
The argument to requite JWT auth is '-a jwt' but it still needs to put the previous arguments to get it running.
Comment 58 Jeff Avila 2021-03-29 13:18:42 MDT
I'm still confused here.

I have been trying to build slurmrestd properly. Our current slurmctld, the
one currently controlling our cluster, isn't the same binary as the
slurmctld that I just built in the process of building a slurmrestd with
JWT and YAML support.
Do we have to use the newly-built slurmctld binary in concert with the new
slurmrestd, or can we continue to use our old slurmctld binary in concert
with the new slurmrestd?

Likewise, is it necessary to pass the "-a jwt" string on the cli to
slurmrestd, before giving it the ipaddr:port to bind to?

Thanks!



On Mon, Mar 29, 2021 at 3:02 PM <bugs@schedmd.com> wrote:

> *Comment # 57 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c57> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #56 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c56>)> I should add that I haven't made the changes in slurm.conf and restarted
> > slurmctld yet; if that makes a difference...here's the output:
> Yes, it does. Note that JWT is being added as a secondary auth and does not
> replace munge.
> > [root@slurmctld sbin]# ./slurmrestd -a jwt
> The argument to requite JWT auth is '-a jwt' but it still needs to put the
> previous arguments to get it running.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 59 Nate Rini 2021-03-29 13:24:35 MDT
(In reply to Jeff Avila from comment #58)
> I have been trying to build slurmrestd properly. Our current slurmctld, the
> one currently controlling our cluster, isn't the same binary as the
> slurmctld that I just built in the process of building a slurmrestd with
> JWT and YAML support.
Does the current slurmctld have libjwt compiled in? Please attach current slurm.conf.

> Do we have to use the newly-built slurmctld binary in concert with the new
> slurmrestd, or can we continue to use our old slurmctld binary in concert
> with the new slurmrestd?
I would suggest using the new binary but if your current binary is at same major version and has the JWT auth plugin compiled, then it should work.
 
> Likewise, is it necessary to pass the "-a jwt" string on the cli to
> slurmrestd, before giving it the ipaddr:port to bind to?
It is suggested if your going to set it up as a http server. Please remember, we do not suggest slurmrestd be directly exposed to the internet.
Comment 60 Jeff Avila 2021-03-29 13:42:31 MDT
Created attachment 18723 [details]
slurm.conf file
Comment 61 Jeff Avila 2021-03-29 13:47:05 MDT
I've attached our slurm.conf to the ticket.
Looking at our existing, running slurmctld binary, it doesn't seem to have
any mention of a jwt library loaded, but again, I haven't made that
modification to slurm.conf...

[root@slurmctld ~]# ps -aef | grep slurmctld
root      8776 17994  0 15:43 pts/11   00:00:00 grep --color=auto slurmctld
root     15286     1  0 Jan05 ?        00:01:39 tail -f slurmctld
slurm    29344     1  1 11:37 ?        00:03:35 /usr/local/sbin/slurmctld
[root@slurmctld ~]# cat /proc/29344/maps  | grep jwt
[root@slurmctld ~]#



On Mon, Mar 29, 2021 at 3:24 PM <bugs@schedmd.com> wrote:

> *Comment # 59 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c59> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #58 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c58>)> I have been trying to build slurmrestd properly. Our current slurmctld, the
> > one currently controlling our cluster, isn't the same binary as the
> > slurmctld that I just built in the process of building a slurmrestd with
> > JWT and YAML support.
> Does the current slurmctld have libjwt compiled in? Please attach current
> slurm.conf.
> > Do we have to use the newly-built slurmctld binary in concert with the new
> > slurmrestd, or can we continue to use our old slurmctld binary in concert
> > with the new slurmrestd?
> I would suggest using the new binary but if your current binary is at same
> major version and has the JWT auth plugin compiled, then it should work.
> > Likewise, is it necessary to pass the "-a jwt" string on the cli to
> > slurmrestd, before giving it the ipaddr:port to bind to?
> It is suggested if your going to set it up as a http server. Please remember,
> we do not suggest slurmrestd be directly exposed to the internet.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 62 Nate Rini 2021-03-29 13:53:30 MDT
(In reply to Jeff Avila from comment #61)
> I've attached our slurm.conf to the ticket.
> Looking at our existing, running slurmctld binary, it doesn't seem to have
> any mention of a jwt library loaded, but again, I haven't made that
> modification to slurm.conf...

Please apply the instructions here:
> https://slurm.schedmd.com/jwt.html
Comment 63 Jeff Avila 2021-03-29 15:33:47 MDT
Hi Nate,

Adding AuthAltTypes=auth/jwt to our current slurm.conf causes slurmctld to
not restart successfully. The error is " fatal: failed to initialize
authentication plugin".

I suppose this means that we have to rebuild *everything* and put the
newly-configured slurmrestd into production along with the corresponding
slurmctld, slurmd, slurmdbd etc.

Can you see any way around that?

Thanks,

-Jeff

On Mon, Mar 29, 2021 at 3:53 PM <bugs@schedmd.com> wrote:

> *Comment # 62 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c62> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #61 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c61>)> I've attached our slurm.conf to the ticket.
> > Looking at our existing, running slurmctld binary, it doesn't seem to have
> > any mention of a jwt library loaded, but again, I haven't made that
> > modification to slurm.conf...
>
> Please apply the instructions here:> https://slurm.schedmd.com/jwt.html
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 64 Nate Rini 2021-03-29 15:38:14 MDT
(In reply to Jeff Avila from comment #63)
> Adding AuthAltTypes=auth/jwt to our current slurm.conf causes slurmctld to
> not restart successfully. The error is " fatal: failed to initialize
> authentication plugin".
call 'slurmctld -Dvvvvvv' and post the log.
 
> I suppose this means that we have to rebuild *everything* and put the
> newly-configured slurmrestd into production along with the corresponding
> slurmctld, slurmd, slurmdbd etc.
> 
> Can you see any way around that?
Probably not. I assume slurmctld was compiled for one of the previously mentioned RPMs?
Comment 65 Jeff Avila 2021-03-29 16:14:48 MDT
Er...is

'slurmctld -Dvvvvvv'

safe to execute on the same host that's already running our production
slurmctld?


On Mon, Mar 29, 2021 at 5:38 PM <bugs@schedmd.com> wrote:

> *Comment # 64 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c64> on bug
> 11134 <https://bugs.schedmd.com/show_bug.cgi?id=11134> from Nate Rini
> <nate@schedmd.com> *
>
> (In reply to Jeff Avila from comment #63 <https://bugs.schedmd.com/show_bug.cgi?id=11134#c63>)> Adding AuthAltTypes=auth/jwt to our current slurm.conf causes slurmctld to
> > not restart successfully. The error is " fatal: failed to initialize
> > authentication plugin".
> call 'slurmctld -Dvvvvvv' and post the log.
> > I suppose this means that we have to rebuild *everything* and put the
> > newly-configured slurmrestd into production along with the corresponding
> > slurmctld, slurmd, slurmdbd etc.
> >
> > Can you see any way around that?
> Probably not. I assume slurmctld was compiled for one of the previously
> mentioned RPMs?
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 66 Nate Rini 2021-03-29 16:24:18 MDT
(In reply to Jeff Avila from comment #65)
> safe to execute on the same host that's already running our production
> slurmctld?

Generally no, it will request the currently running slurmctld to shutdown and then it will take over (unless it errors) on startup.
Comment 67 Nate Rini 2021-04-05 13:22:49 MDT
Jeff,

I'm going to time this ticket out while we wait for an outage window. Please reply and the ticket will automatically re-open.

Thanks,
--Nate