Ticket 12195

Summary: slurm api authentication issue
Product: Slurm Reporter: Praveen SV <vijayap>
Component: slurmrestdAssignee: Nate Rini <nate>
Status: RESOLVED TIMEDOUT QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: nate
Version: 20.11.8   
Hardware: Other   
OS: Linux   
Site: Roche/PHCIX Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: slurm rest apierrors

Description Praveen SV 2021-08-03 06:34:21 MDT
Created attachment 20647 [details]
slurm rest apierrors

Hi Team,

we are working on slurm rest api. we are following the api as per this link [https://app.swaggerhub.com/apis/rherrick/slurm-rest_api/0.0.35#/default/get_slurm_v0_0_35_diag_]. We are getting empty response. We are using JWT authentication method. Can you please guide us what are we missing here.

Attaching the response and screen shots

the errors in slurm rest api:

For Listing Jobs:
1) Url: /slurm/v0.0.35/jobs
Response: []
Response Code: 500
2) Url: /slurm/v0.0.36/jobs
Response: {
    "meta": {
        "plugin": {
            "type": "openapi/v0.0.36",
            "name": "REST v0.0.36"
        },
        "Slurm": {
            "version": {
                "major": 20,
                "micro": 8,
                "minor": 11
            },
            "release": "20.11.8"
        }
    },
    "errors": [],
    "jobs": []
}
Response code: 500

PFA

Regards,
Praveen
Comment 1 Nate Rini 2021-08-03 09:30:14 MDT
(In reply to Praveen SV from comment #0)
> we are working on slurm rest api. we are following the api as per this link
> [https://app.swaggerhub.com/apis/rherrick/slurm-rest_api/0.0.35#/default/
> get_slurm_v0_0_35_diag_].

That is neat, I have been using [https://editor.swagger.io/] to verify swagger functionality.

> 1) Url: /slurm/v0.0.35/jobs
> Response: []
> Response Code: 500

Side note: Please use 'v0.0.36' as it is the latest plugin in slurm-20.11.

> 2) Url: /slurm/v0.0.36/jobs
> Response: {
>     "meta": {
>         "plugin": {
>             "type": "openapi/v0.0.36",
>             "name": "REST v0.0.36"
>         },
>         "Slurm": {
>             "version": {
>                 "major": 20,
>                 "micro": 8,
>                 "minor": 11
>             },
>             "release": "20.11.8"
>         }
>     },
>     "errors": [],
>     "jobs": []
> }
> Response code: 500

Looks like a bug since return 500 means server error but the errors array isn't populated. Please provide the slurmctld log around the time of the query.
Comment 2 Praveen SV 2021-08-04 00:59:59 MDT
Hi ,

Please find the slurmctld log below.

Getting errors

[2021-08-04T06:53:07.290] error: slurm_auth_verify: jwt_decode failure
[2021-08-04T06:53:07.290] error: slurm_unpack_received_msg: g_slurm_auth_verify: REQUEST_JOB_INFO has authentication error: Unspecified error
[2021-08-04T06:53:07.290] error: slurm_unpack_received_msg: Protocol authentication error
[2021-08-04T06:53:07.300] error: slurm_receive_msg [10.175.228.7:56586]: Unspecified error
[2021-08-04T06:56:08.845] error: slurm_auth_verify: jwt_decode failure
[2021-08-04T06:56:08.845] error: slurm_unpack_received_msg: g_slurm_auth_verify: REQUEST_JOB_INFO has authentication error: Unspecified error
[2021-08-04T06:56:08.845] error: slurm_unpack_received_msg: Protocol authentication error
[2021-08-04T06:56:08.855] error: slurm_receive_msg [10.175.228.7:56806]: Unspecified error
[2021-08-04T06:57:05.005] error: slurm_auth_verify: jwt_decode failure
[2021-08-04T06:57:05.005] error: slurm_unpack_received_msg: g_slurm_auth_verify: REQUEST_JOB_INFO has authentication error: Unspecified error
[2021-08-04T06:57:05.005] error: slurm_unpack_received_msg: Protocol authentication error
[2021-08-04T06:57:05.015] error: slurm_receive_msg [10.175.228.7:56878]: Unspecified error
Comment 3 Nate Rini 2021-08-04 10:33:08 MDT
(In reply to Praveen SV from comment #2)
> [2021-08-04T06:53:07.290] error: slurm_auth_verify: jwt_decode failure

The controller (slurmctld) is killing the connection due to the auth failure. In order to provide attackers no extra information, Slurm closes any connection that fails auth immediately on failure.

Looks like the lack of an error sent to the client is a duplicate of bug#11442 which has already been resolved upstream.

How is the JWT being requested and provided in the request to slurmrestd?
Comment 4 Praveen SV 2021-08-04 12:43:45 MDT
Hi Nate Rini,

Not able to view this bug#11442

Thanks
Comment 5 Nate Rini 2021-08-04 12:48:07 MDT
(In reply to Praveen SV from comment #4)
> Not able to view this bug#11442

Yes, that one is tagged private. All of the patches will be included with slurm-21.08 release:
> https://github.com/SchedMD/slurm/compare/28aee570b933cb786638a9c260fb17ea6c2679ec...f8afdea8f589c3417658de37319f881d7069cefa
Comment 6 Praveen SV 2021-08-05 10:55:59 MDT
Hi Nate Rini,

Still we are not getting any response. We are following this JWT Authentication

https://slurm.schedmd.com/jwt.html

Regards,
Praveen
Comment 7 Nate Rini 2021-08-06 11:53:10 MDT
(In reply to Praveen SV from comment #6)
> https://slurm.schedmd.com/jwt.html

Please provide additional details on how exactly you are getting the JWT auth tokens and how are they being sent. Please also run slurmrestd at a higher debug level (add -vvvvv) to the command arguments to get more verbose logs.

Please check the examples here too:
> https://slurm.schedmd.com/SLUG20/REST_API.pdf
Comment 8 Praveen SV 2021-08-09 07:24:45 MDT
Hi Nate,

The authentication worked I made few changes in slurm.conf file. Now for generating token. If there a way we can generate token with indefinite lifespan.


Regards,
Praveen
Comment 9 Nate Rini 2021-08-09 09:09:29 MDT
(In reply to Praveen SV from comment #8)
> The authentication worked I made few changes in slurm.conf file. Now for
> generating token. If there a way we can generate token with indefinite
> lifespan.

An expiration is required for Slurm JWTs but the amount of time is given by a signed 32bit integer allowing very long-lived tokens. I suggest just passing token request with a long 'lifespan=' argument if desired.

Are there any more questions?
Comment 10 Nate Rini 2021-08-13 12:48:25 MDT
reducing severity while waiting on response to comment#9
Comment 11 Praveen SV 2021-08-17 12:02:46 MDT
Hi Nate,

We are facing issue while submitting a job using rest api. Please find the below error.

"error": "Jobs description entry not found, empty or not dictionary or list"
"errors": [
        {
            "error_code": 9002,
            "error": "Jobs description entry not found, empty or not dictionary or list"
        }
    ]

What could be possibly wrong ?

Best Regards,
Praveen
Comment 12 Nate Rini 2021-08-17 12:23:44 MDT
(In reply to Praveen SV from comment #11)
> Hi Nate,
> 
> We are facing issue while submitting a job using rest api. Please find the
> below error.
>             "error": "Jobs description entry not found, empty or not
> dictionary or list"

Please provide log from slurmrestd and the job's JSON request?
Comment 13 Praveen SV 2021-08-18 12:19:41 MDT
Hi Nate,

Unable to locate the slurmrestd logs. Please provide me the path or steps to enable/configure it.

PLease find the json request

Please find the request and response of job submit api

Curl request:
curl --location --request POST '/slurm/v0.0.36/job/submit' \
--header 'X-SLURM-USER-TOKEN: #########' \
--header 'X-SLURM-USER-NAME: ##########' \
--header 'Content-Type: application/json' \
--data-raw '{
  "name": "test",
  "partition": "C-16Cpu-30GB",
  "script": "test.sh",
  "environment": {}
}'

Response:
{
    "meta": {
        "plugin": {
            "type": "openapi/v0.0.36",
            "name": "REST v0.0.36"
        },
        "Slurm": {
            "version": {
                "major": 20,
                "micro": 8,
                "minor": 11
            },
            "release": "20.11.8"
        }
    },
    "errors": [
        {
            "error_code": 9002,
            "error": "Jobs description entry not found, empty or not dictionary or list"
        }
    ]
}


Best Regards,
Praveen
Comment 14 Nate Rini 2021-08-18 12:24:41 MDT
(In reply to Praveen SV from comment #13)
> Unable to locate the slurmrestd logs. Please provide me the path or steps to
> enable/configure it.
How is slurmrestd being run? via systemd?
Comment 15 Praveen SV 2021-08-18 12:41:59 MDT
Hi Nate,

Yes - systemd

systemctl status slurmrestd
● slurmrestd.service - Slurm REST daemon
   Loaded: loaded (/lib/systemd/system/slurmrestd.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2021-07-28 12:02:46 UTC; 3 weeks 0 days ago
 Main PID: 12078 (slurmrestd)
    Tasks: 21 (limit: 4915)
   CGroup: /system.slice/slurmrestd.service
           └─12078 /shared/slurm_SLURM-MASTER-USW2-HPC-SB/sbin/slurmrestd 0.0.0.0:80


Regards,
Praveen
Comment 16 Nate Rini 2021-08-18 12:43:50 MDT
Please call:
> cat /lib/systemd/system/slurmrestd.service
Comment 17 Praveen SV 2021-08-18 12:46:29 MDT
Hi,

root:~# cat /lib/systemd/system/slurmrestd.service
[Unit]
Description=Slurm REST daemon
After=network.target munge.service slurmctld.service
ConditionPathExists=/shared/slurm_SLURM-MASTER-USW2-HPC-SB/etc/slurm.conf

[Service]
Type=simple
EnvironmentFile=-/etc/sysconfig/slurmrestd
Environment="SLURM_JWT=daemon"
ExecStart=/shared/slurm_SLURM-MASTER-USW2-HPC-SB/sbin/slurmrestd $SLURMRESTD_OPTIONS 0.0.0.0:80
#ExecStart=/shared/slurm_SLURM-MASTER-USW2-HPC-SB/sbin/slurmrestd -f /shared/slurm_SLURM-MASTER-USW2-HPC-SB/etc/slurm.conf 0.0.0.0:80
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target


Thanks
Comment 18 Nate Rini 2021-08-18 12:51:32 MDT
Add this line to /etc/sysconfig/slurmrestd:
> SLURMRESTD_OPTIONS="-vvvvv"

then call:
> systemctl daemon-reload
> systemctl restart slurmrestd.service

run the test curl request

attach the output:
> journalctl _SYSTEMD_UNIT=slurmrestd.service
Comment 19 Nate Rini 2021-08-24 09:49:47 MDT
Any updates?
Comment 20 Nate Rini 2021-08-27 09:08:18 MDT
(In reply to Nate Rini from comment #19)
> Any updates?

Reducing ticket severity as there has been a reply to comment#18 in more than a week.
Comment 21 Nate Rini 2021-09-13 10:59:57 MDT
I'm going to close this ticket as timed out. When convenient, please reply to comment #18 and we can continue debugging.

Thanks,
--Nate
Comment 22 Praveen SV 2021-10-11 04:33:36 MDT
HI Team,

Is there a way we can generate jwt token using api ?

Regards,
Praveen
Comment 23 Nate Rini 2021-10-11 08:26:05 MDT
(In reply to Praveen SV from comment #22)
> Is there a way we can generate jwt token using API ?

Tokens can not currently be generated using the rest API. Sites are always welcome to submit RFE tickets.

The tokens used by Slurm are rfc5719 compliant and can be generated via multiple means outside of Slurm. See details here:
> https://slurm.schedmd.com/jwt.html#compatibility

Are there any more questions?
Comment 24 Praveen SV 2021-10-13 08:52:23 MDT
Hi Nate,

When we generate jwt token using the command <<scontrol token $username>> and pass it in api call its working fine. But when we generate token using this python block as mentioned [https://slurm.schedmd.com/jwt.html ] the api call throws 5005 error.

Python block

import sys
import os
import pprint
import json
import time
from datetime import datetime, timedelta, timezone

from jwt import JWT
from jwt.jwa import HS256
from jwt.jwk import jwk_from_dict
from jwt.utils import b64decode,b64encode

if len(sys.argv) != 3:
    sys.exit("gen_jwt.py [user name] [expiration time (seconds)]");

with open("jwt_hs256.key", "rb") as f:
    priv_key = f.read()

signing_key = jwk_from_dict({
    'kty': 'oct',
    'k': b64encode(priv_key)
})

message = {
    "exp": int(time.time() + int(sys.argv[2])),
    "iat": int(time.time()),
    "sun": sys.argv[1]
}

a = JWT()
compact_jws = a.encode(message, signing_key, alg='HS256')
print("SLURM_JWT={}".format(compact_jws))


-------------------------------------------------------------


when we submit a job using api. We are getting response as 

{
    "meta": {
        "plugin": {
            "type": "openapi/v0.0.36",
            "name": "REST v0.0.36"
        },
        "Slurm": {
            "version": {
                "major": 20,
                "micro": 8,
                "minor": 11
            },
            "release": "20.11.8"
        }
    },
    "errors": [
        {
            "error_code": 5005,
            "error": "Zero Bytes were transmitted or received"
        }
    ]
}


Slurmctld.log

[2021-10-13T11:08:53.844] error: slurm_unpack_received_msg: g_slurm_auth_verify: REQUEST_SUBMIT_BATCH_JOB has authentication error: Unspecified error
[2021-10-13T11:08:53.844] error: slurm_unpack_received_msg: Protocol authentication error
[2021-10-13T11:08:53.854] error: slurm_receive_msg [10.175.228.7:37188]: Unspecified error
[2021-10-13T14:45:15.800] error: slurm_auth_verify: jwt_decode failure
[2021-10-13T14:45:15.800] error: slurm_unpack_received_msg: g_slurm_auth_verify: REQUEST_SUBMIT_BATCH_JOB has authentication error: Unspecified error
[2021-10-13T14:45:15.800] error: slurm_unpack_received_msg: Protocol authentication error
[2021-10-13T14:45:15.810] error: slurm_receive_msg [10.175.228.7:49488]: Unspecified error
[2021-10-13T14:47:18.281] error: slurm_auth_verify: jwt_decode failure
[2021-10-13T14:47:18.281] error: slurm_unpack_received_msg: g_slurm_auth_verify: REQUEST_SUBMIT_BATCH_JOB has authentication error: Unspecified error
[2021-10-13T14:47:18.281] error: slurm_unpack_received_msg: Protocol authentication error
[2021-10-13T14:47:18.291] error: slurm_receive_msg [10.175.228.7:52812]: Unspecified error




Best Regards,
Praveen
Comment 25 Nate Rini 2021-10-13 09:43:38 MDT
(In reply to Praveen SV from comment #24)
> Slurmctld.log
> [2021-10-13T14:47:18.281] error: slurm_auth_verify: jwt_decode failure

the generated token appears to be incorrect.

> with open("jwt_hs256.key", "rb") as f:
>     priv_key = f.read()
Please make sure that the priv_key is correctly getting loaded. The example code doesn't really have any error handling.
Comment 26 Praveen SV 2021-10-14 04:41:03 MDT
Hi Nate,

They key file is loading correctly. The token is also generated. But using the 
generated token we are getting the 5005 error.

Regards,
Praveen
Comment 27 Nate Rini 2021-10-14 11:06:02 MDT
(In reply to Praveen SV from comment #24)
> [2021-10-13T14:45:15.800] error: slurm_auth_verify: jwt_decode failure

Is it possible to get this patch applied and try the auth again?
> https://github.com/SchedMD/slurm/commit/ea41752047548d2263e148b6e65a75e2a95f48f3

The 20.11 error is not helpful in determining the why.
Comment 28 Nate Rini 2021-10-14 11:09:26 MDT
We can also use gdb if getting patches is difficult for your site.
Comment 29 Praveen SV 2021-10-20 02:48:57 MDT
Hi Nate,

I'm not able to locate src/plugins/auth/jwt/auth_jwt.c in my system.

https://github.com/SchedMD/slurm/commit/ea41752047548d2263e148b6e65a75e2a95f48f3?branch=ea41752047548d2263e148b6e65a75e2a95f48f3&diff=unified

Best Regards,
Praveen
Comment 30 Nate Rini 2021-10-20 08:28:34 MDT
How was Slurm installed?
Comment 31 Praveen SV 2021-10-20 10:06:53 MDT
Hi Nate,

Please find the steps

1.Download the lastest slurm version 
https://www.schedmd.com/downloads.php 

2.#decompress the slurm source files and compile

tar -xvf ${script_dir}/../slurm-${SlurmVersion}.tar.bz2 -C /tmp/
cd /tmp/slurm-${SlurmVersion}
/tmp/slurm-${SlurmVersion}/configure --prefix=${slurm_path} --with-libcurl --with-jwt
make -j $(nproc)
make install

slurm_path="/shared/slurm_SLURM-MASTER-USW2-HPC-SB"

3.Restart the services

systemctl restart slurmdbd
systemctl restart slurmctld
systemctl restart slurmrestd

Best Regards,
Praveen
Comment 32 Nate Rini 2021-10-20 10:10:27 MDT
(In reply to Praveen SV from comment #31)
> Please find the steps
Please swap to this:
 
> 1.Download the lastest slurm version 
> https://www.schedmd.com/downloads.php 
> 
> 2.#decompress the slurm source files and compile
> 
> tar -xvf ${script_dir}/../slurm-${SlurmVersion}.tar.bz2 -C /tmp/
> cd /tmp/slurm-${SlurmVersion}
> patch -p1 < $PATH_TO_PATCH

Just apply the patch before configure and make is called.

> /tmp/slurm-${SlurmVersion}/configure --prefix=${slurm_path} --with-libcurl
> --with-jwt
> make -j $(nproc)
> make install
> 
> slurm_path="/shared/slurm_SLURM-MASTER-USW2-HPC-SB"
I suggest applying the patch to a test directory first but it will only take affect when daemons are restarted.

Please tell me if that works.
Comment 33 Nate Rini 2021-12-02 09:34:46 MST
Any updates?
Comment 34 Nate Rini 2021-12-21 11:15:50 MST
Please reply when convient. I'm closing this ticket but it will reopen automatically on reply.