Ticket 17301 - auth/jwt: Could not load key file
Summary: auth/jwt: Could not load key file
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 23.02.3
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Nate Rini
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-07-27 07:36 MDT by GSK-ONYX-SLURM
Modified: 2023-08-11 09:15 MDT (History)
1 user (show)

See Also:
Site: GSK
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: RHEL
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurm.conf (2.67 KB, text/plain)
2023-07-27 08:36 MDT, GSK-ONYX-SLURM
Details
slurmctld service (4.11 KB, text/plain)
2023-07-27 08:36 MDT, GSK-ONYX-SLURM
Details
the strace log (14.63 KB, text/plain)
2023-07-31 01:54 MDT, GSK-ONYX-SLURM
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description GSK-ONYX-SLURM 2023-07-27 07:36:33 MDT
Dear SchedMD Team,

I've recently upgraded Slurm from the 22.05.2 version to 23.02.3 and I noticed that slurm control deamon on both - head node and backup controller was down after the upgrade. Looking at the service I found that it was unable to load the jwt key, while in the previous version everything worked properly:

[root@uk1sxlx00129 ~]# systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2023-07-27 13:46:14 BST; 15min ago
  Process: 108197 ExecStart=/home/slurm/Software/RHEL7/slurm/23.02.3/sbin/slurmctld -D $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 108197 (code=exited, status=1/FAILURE)

Jul 27 13:46:14 uk1sxlx00129.corpnet2.com systemd[1]: Started Slurm controller daemon.
Jul 27 13:46:14 uk1sxlx00129.corpnet2.com slurmctld[108197]: slurmctld: fatal: auth/jwt: Could not load key file (/home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key)
Jul 27 13:46:14 uk1sxlx00129.corpnet2.com systemd[1]: slurmctld.service: main process exited, code=exited, status=1/FAILURE
Jul 27 13:46:14 uk1sxlx00129.corpnet2.com systemd[1]: Unit slurmctld.service entered failed state.
Jul 27 13:46:14 uk1sxlx00129.corpnet2.com systemd[1]: slurmctld.service failed.
[root@uk1sxlx00129 ~]#

I decided to change permissions of the key from 700 to 755 and the service started working.

Based on the documentation: https://slurm.schedmd.com/jwt.html#setup the permission should have been even more restricted. 

Do I need to change the permission of the key across all the clusters?

Thanks,
Radek
Comment 1 Nate Rini 2023-07-27 08:31:46 MDT
(In reply to GSK-EIS-SLURM from comment #0)
> I decided to change permissions of the key from 700 to 755 and the service
> started working.
>
> Do I need to change the permission of the key across all the clusters?

The jwt key should never be visible to world. Any one who can read the key will effectively have root access to the cluster. Most likely cause of the issue was the user/group was incorrect.

Please provide the output of:
> systemctl show slurmctld.service
and attach slurm.conf
Comment 2 GSK-ONYX-SLURM 2023-07-27 08:36:06 MDT
Created attachment 31481 [details]
slurm.conf
Comment 3 GSK-ONYX-SLURM 2023-07-27 08:36:27 MDT
Created attachment 31482 [details]
slurmctld service
Comment 5 Nate Rini 2023-07-27 09:34:59 MDT
Please try and paste the log:

> stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> chown slurm /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> chmod 0660 /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> sudo -u slurm stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> sudo -u nobody stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
Comment 6 GSK-ONYX-SLURM 2023-07-27 09:48:44 MDT
(In reply to Nate Rini from comment #5)
> Please try and paste the log:
> 
> > stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> > chown slurm /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> > chmod 0660 /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> > stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> > sudo -u slurm stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> > sudo -u nobody stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key

Here you are:

-bash-4.2$ stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
  File: ‘/home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key’
  Size: 1679            Blocks: 8          IO Block: 8192   regular file
Device: 2dh/45d Inode: 76744497    Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (63124/   slurm)   Gid: (63124/   slurm)
Access: 2022-05-10 10:08:56.308967000 +0100
Modify: 2021-01-12 10:22:48.753907000 +0000
Change: 2023-07-27 16:45:22.494768000 +0100
 Birth: -
-bash-4.2$
-bash-4.2$
-bash-4.2$ chown slurm /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
-bash-4.2$
-bash-4.2$
-bash-4.2$ chmod 0660 /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
-bash-4.2$
-bash-4.2$
-bash-4.2$ stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
  File: ‘/home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key’
  Size: 1679            Blocks: 8          IO Block: 8192   regular file
Device: 2dh/45d Inode: 76744497    Links: 1
Access: (0660/-rw-rw----)  Uid: (63124/   slurm)   Gid: (63124/   slurm)
Access: 2022-05-10 10:08:56.308967000 +0100
Modify: 2021-01-12 10:22:48.753907000 +0000
Change: 2023-07-27 16:45:45.289773000 +0100
 Birth: -
-bash-4.2$
-bash-4.2$
-bash-4.2$ sudo -u slurm stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
sudo: Password expired, contact your system administrator
-bash-4.2$
-bash-4.2$
-bash-4.2$ sudo -u nobody stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

    #1) Respect the privacy of others.
    #2) Think before you type.
    #3) With great power comes great responsibility.

[sudo] password for slurm:
-bash-4.2$
-bash-4.2$
-bash-4.2$

To be honest I don't know the password for the slurm user, I have never needed it. I'm going to request the password to be restored.
Comment 7 Nate Rini 2023-07-27 10:02:24 MDT
(In reply to GSK-EIS-SLURM from comment #6)
> To be honest I don't know the password for the slurm user, I have never
> needed it. I'm going to request the password to be restored.

There is no reason for slurm user to have a password (and several for it to not). It appears the sudoers on this cluster is strict, which is fine, so please try this instead:
> su -c "stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key" slurm
> su -c "stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key" nobody
Comment 8 GSK-ONYX-SLURM 2023-07-27 22:51:30 MDT
(In reply to Nate Rini from comment #7)

> please try this instead:
> > su -c "stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key" slurm
> > su -c "stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key" nobody

[root@uk1sxlx00128 ~]# su -c "stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key" slurm
  File: '/home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key'
  Size: 1679            Blocks: 8          IO Block: 8192   regular file
Device: 2dh/45d Inode: 76744497    Links: 1
Access: (0660/-rw-rw----)  Uid: (63124/   slurm)   Gid: (63124/   slurm)
Access: 2022-05-10 10:08:56.308967000 +0100
Modify: 2021-01-12 10:22:48.753907000 +0000
Change: 2023-07-27 16:45:45.289773000 +0100
 Birth: -
[root@uk1sxlx00128 ~]# su -c "stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key" nobody
This account is currently not available.
[root@uk1sxlx00128 ~]#
Comment 9 Nate Rini 2023-07-28 08:18:08 MDT
(In reply to GSK-EIS-SLURM from comment #8)
> [root@uk1sxlx00128 ~]# su -c "stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key"
> Access: (0660/-rw-rw----)  Uid: (63124/   slurm)   Gid: (63124/   slurm)

This looks correct. Please try restarting slurmctld or slurmdbd to verify access is now correct.

> [root@uk1sxlx00128 ~]# su -c "stat
> /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key" nobody
> This account is currently not available.

This was the to check other users can't access the file. If possible, please try this with a normal user.
Comment 10 GSK-ONYX-SLURM 2023-07-28 11:00:35 MDT
(In reply to Nate Rini from comment #9)

> This looks correct. Please try restarting slurmctld or slurmdbd to verify
> access is now correct.

I restarted both and it's still the same:

[root@uk1sxlx00128 ~]# systemctl -l status slurmctld.service
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2023-07-28 17:58:03 BST; 31s ago
  Process: 39640 ExecStart=/home/slurm/Software/RHEL7/slurm/23.02.3/sbin/slurmctld -D $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 39640 (code=exited, status=1/FAILURE)

Jul 28 17:58:03 uk1sxlx00128.corpnet2.com systemd[1]: Started Slurm controller daemon.
Jul 28 17:58:03 uk1sxlx00128.corpnet2.com slurmctld[39640]: slurmctld: fatal: auth/jwt: Could not load key file (/home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key)
Jul 28 17:58:03 uk1sxlx00128.corpnet2.com slurmctld[39640]: fatal: auth/jwt: Could not load key file (/home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key)
Jul 28 17:58:03 uk1sxlx00128.corpnet2.com systemd[1]: slurmctld.service: main process exited, code=exited, status=1/FAILURE
Jul 28 17:58:03 uk1sxlx00128.corpnet2.com systemd[1]: Unit slurmctld.service entered failed state.
Jul 28 17:58:03 uk1sxlx00128.corpnet2.com systemd[1]: slurmctld.service failed.
[root@uk1sxlx00128 ~]#

> This was the to check other users can't access the file. If possible, please
> try this with a normal user.

When I execute this command as me, there's a password prompt:

rd178639@uk1sxlx00128 ~ su -c "stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key" nobody
Password:
Comment 11 Nate Rini 2023-07-28 19:29:08 MDT
(In reply to GSK-EIS-SLURM from comment #10)
> I restarted both and it's still the same:
> 
> Jul 28 17:58:03 uk1sxlx00128.corpnet2.com slurmctld[39640]: fatal: auth/jwt:
> Could not load key file (/home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key)

This error is from slurmctld being unable to open and memory map in the jwt_hs256.key file. We can use strace to see if we can find out why:
> strace -o strace.log -e openat,open,mmap -- /home/slurm/Software/RHEL7/slurm/23.02.3/sbin/slurmctld -D $SLURMCTLD_OPTIONS

Note that $SLURMCTLD_OPTIONS will need to be filled in by value in /etc/sysconfig/slurmctld (if it exists). Please attach the strace.log.


> When I execute this command as me, there's a password prompt:
> 
> rd178639@uk1sxlx00128 ~ su -c "stat
> /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key" nobody> Password:

These commands need to be executed as root.
Comment 12 GSK-ONYX-SLURM 2023-07-31 01:54:06 MDT
(In reply to Nate Rini from comment #11)

> This error is from slurmctld being unable to open and memory map in the
> jwt_hs256.key file. We can use strace to see if we can find out why:
> > strace -o strace.log -e openat,open,mmap -- /home/slurm/Software/RHEL7/slurm/23.02.3/sbin/slurmctld -D $SLURMCTLD_OPTIONS

I'm attaching the strace log. 

> > rd178639@uk1sxlx00128 ~ su -c "stat
> > /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key" nobody> Password:
> 
> These commands need to be executed as root.

I had executed it as root and then I was told to do it as a normal user, that's why I tried to execute it from my account.

The output when it's executed as root:

[root@uk1sxlx00128 ~]# su -c "stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key" nobody
This account is currently not available.
[root@uk1sxlx00128 ~]#
Comment 13 GSK-ONYX-SLURM 2023-07-31 01:54:27 MDT
Created attachment 31526 [details]
the strace log
Comment 14 Nate Rini 2023-07-31 08:51:43 MDT
(In reply to GSK-EIS-SLURM from comment #13)
> Created attachment 31526 [details]
> the strace log
> 
> open("/home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)

Permissions are still failing. Is this a shared filesystem? Is Selinux active?

Please call (as root):
> namei /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> sestatus

(In reply to GSK-EIS-SLURM from comment #12)
> I had executed it as root and then I was told to do it as a normal user,
> that's why I tried to execute it from my account.

I wanted to verify the result of both types of users.

When su/sudo are called via a normal user, they will activate the pam configuration if not normally allowed to impersonate the other user, which happened here:
> rd178639@uk1sxlx00128 ~ su -c "stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key" nobody
> Password:

.
 
> The output when it's executed as root:
> 
> [root@uk1sxlx00128 ~]# su -c "stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key" nobody
> This account is currently not available.

Looks like nobody has nologin setup too. Please call this as rd178639 instead (assuming this user doesn't have any special permissions):
> stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
Comment 15 GSK-ONYX-SLURM 2023-07-31 08:58:50 MDT
(In reply to Nate Rini from comment #14)

> 
> Permissions are still failing. Is this a shared filesystem? Is Selinux
> active?

Yes, this is NFS and no, Selinux is disabled.

> 
> Please call (as root):
> > namei /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> > sestatus

[root@uk1sxlx00128 ~]# namei /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
f: /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
 d /
 d home
 d uk_hpc_crash
 d StateSaveLocation
 - jwt_hs256.key
[root@uk1sxlx00128 ~]# sestatus
SELinux status:                 disabled
[root@uk1sxlx00128 ~]#

> Looks like nobody has nologin setup too. Please call this as rd178639
> instead (assuming this user doesn't have any special permissions):
> > stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key

rd178639@uk1sxlx00128 ~ stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
  File: '/home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key'
  Size: 1679            Blocks: 8          IO Block: 8192   regular file
Device: 2dh/45d Inode: 76744497    Links: 1
Access: (0660/-rw-rw----)  Uid: (63124/   slurm)   Gid: (63124/   slurm)
Access: 2022-05-10 10:08:56.308967000 +0100
Modify: 2021-01-12 10:22:48.753907000 +0000
Change: 2023-07-27 16:45:45.289773000 +0100
 Birth: -
rd178639@uk1sxlx00128 ~
Comment 16 Nate Rini 2023-07-31 09:08:06 MDT
(In reply to GSK-EIS-SLURM from comment #15)
> (In reply to Nate Rini from comment #14)
> > 
> > Permissions are still failing. Is this a shared filesystem? Is Selinux
> > active?
> 
> Yes, this is NFS and no, Selinux is disabled.

Is rootsquash enabled? Is this nfs 3 or 4? Is the lock daemon running? Is kerberos being used for auth?

Do slurmctld and slurmdbd run on the same host? Are they being run as the same user? The config in comment#3 doesn't have a user= entry so looks like it is being started as root. I would suggest having both daemons be configured with systemd unit file based on the file's user/group:
> user=slurm
> group=slurm

.

> > Looks like nobody has nologin setup too. Please call this as rd178639
> > instead (assuming this user doesn't have any special permissions):
> > > stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> 
> rd178639@uk1sxlx00128 ~ stat /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
>   File: '/home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key'
> Access: (0660/-rw-rw----)  Uid: (63124/   slurm)   Gid: (63124/   slurm)

This user can read the parent directory, so let's see if it can read the file. Please call as rd178639:
> file /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key

You could also just hexdump the file but please don't post it here.
Comment 17 GSK-ONYX-SLURM 2023-08-01 00:17:08 MDT
(In reply to Nate Rini from comment #16)

> Is rootsquash enabled? Is this nfs 3 or 4? 

These are all the options the share is mounted with:

/home/slurm type nfs (rw,relatime,vers=3,rsize=8192,wsize=8192,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.184.24.115,mountvers=3,mountport=635,mountproto=udp,local_lock=none,addr=10.184.24.115)

> Is the lock daemon running? Is
> kerberos being used for auth?

No and no.

> 
> Do slurmctld and slurmdbd run on the same host? 

Yes.

> Are they being run as the
> same user? The config in comment#3 doesn't have a user= entry so looks like
> it is being started as root. 

Yes - it's root.

> I would suggest having both daemons be
> configured with systemd unit file based on the file's user/group:
> > user=slurm
> > group=slurm

Once added, the slurmctl daemon started working:

[root@uk1sxlx00128 ~]# systemctl restart slurmctld.service
[root@uk1sxlx00128 ~]# systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2023-08-01 06:59:26 BST; 5s ago
 Main PID: 32511 (slurmctld)
   CGroup: /system.slice/slurmctld.service
           ├─32511 /home/slurm/Software/RHEL7/slurm/23.02.3/sbin/slurmctld -D
           └─32512 slurmctld: slurmscriptd

Aug 01 06:59:31 uk1sxlx00128.corpnet2.com slurmctld[32511]: slurmctld: Down nodes: uk1salx00717
Aug 01 06:59:31 uk1sxlx00128.corpnet2.com slurmctld[32511]: slurmctld: Recovered information about 0 jobs
Aug 01 06:59:31 uk1sxlx00128.corpnet2.com slurmctld[32511]: slurmctld: select/cons_res: part_data_create_array: select/cons_res: preparing for 1 partitions
Aug 01 06:59:31 uk1sxlx00128.corpnet2.com slurmctld[32511]: slurmctld: Recovered state of 0 reservations
Aug 01 06:59:31 uk1sxlx00128.corpnet2.com slurmctld[32511]: slurmctld: State of 0 triggers recovered
Aug 01 06:59:31 uk1sxlx00128.corpnet2.com slurmctld[32511]: slurmctld: select/cons_res: select_p_reconfigure: select/cons_res: reconfigure
Aug 01 06:59:31 uk1sxlx00128.corpnet2.com slurmctld[32511]: slurmctld: select/cons_res: part_data_create_array: select/cons_res: preparing for 1 partitions
Aug 01 06:59:31 uk1sxlx00128.corpnet2.com slurmctld[32511]: slurmctld: Running as primary controller
Aug 01 06:59:31 uk1sxlx00128.corpnet2.com slurmctld[32511]: slurmctld: No parameter for mcs plugin, default values set
Aug 01 06:59:31 uk1sxlx00128.corpnet2.com slurmctld[32511]: slurmctld: mcs: MCSParameters = (null). ondemand set.
[root@uk1sxlx00128 ~]#


> This user can read the parent directory, so let's see if it can read the
> file. Please call as rd178639:
> > file /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key

-bash-4.2$ file /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
/home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key: PEM RSA private key
-bash-4.2$

> You could also just hexdump the file but please don't post it here.

Yes, I can.

It seems the problem is now resolved. I will make sure the slurmctl and slurmdb daemons are running as a slurm user on all the clusters.

Thanks a lot for your support!

Radek
Comment 18 Nate Rini 2023-08-01 08:32:31 MDT
(In reply to GSK-EIS-SLURM from comment #17)
> (In reply to Nate Rini from comment #16)
> > This user can read the parent directory, so let's see if it can read the
> > file. Please call as rd178639:
> > > file /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> -bash-4.2$ file /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key
> /home/uk_hpc_crash/StateSaveLocation/jwt_hs256.key: PEM RSA private key
> > You could also just hexdump the file but please don't post it here.
> 
> Yes, I can.

Just make sure that normal users can't read jwt_hs256.key or they will have root over the cluster.
 
> It seems the problem is now resolved. I will make sure the slurmctl and
> slurmdb daemons are running as a slurm user on all the clusters.

Closing out ticket.
Comment 19 GSK-ONYX-SLURM 2023-08-11 04:28:00 MDT
(In reply to Nate Rini from comment #16)

> it is being started as root. I would suggest having both daemons be
> configured with systemd unit file based on the file's user/group:
> > user=slurm
> > group=slurm

Hi Nate -- one quick question to this. I've already added a user and a group to the slurmctld and slurmdbd systemd config files. However I noticed that the following warning / info  appears after the service restart:

Aug 11 00:58:11 us1salx09012.corpnet2.com slurmdbd[3529030]: slurmdbd: Not running as root. Can't drop supplementary groups

Just wanted to check with you when potentially it could cause an issue (if any)? What groups it is about?

Thanks,
Radek
Comment 20 Nate Rini 2023-08-11 09:15:15 MDT
(In reply to GSK-EIS-SLURM from comment #19)
> Hi Nate -- one quick question to this. I've already added a user and a group
> to the slurmctld and slurmdbd systemd config files. However I noticed that
> the following warning / info  appears after the service restart:
> 
> Aug 11 00:58:11 us1salx09012.corpnet2.com slurmdbd[3529030]: slurmdbd: Not
> running as root. Can't drop supplementary groups
> 
> Just wanted to check with you when it could potentially cause an issue (if
> any)? What groups it is about?

It is a warning about not being able to drop the supplementary group IDs. This warning predated systemd and was there to ensure that the old sysv init didn't have extra groups on the process that could allow a user to attach to the daemon using ptrace to cause security issues. I've opened bug#17412 to modify this log message.