Summary: | slurm api authentication issue | ||
---|---|---|---|
Product: | Slurm | Reporter: | Praveen SV <vijayap> |
Component: | slurmrestd | Assignee: | Nate Rini <nate> |
Status: | RESOLVED TIMEDOUT | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | CC: | nate |
Version: | 20.11.8 | ||
Hardware: | Other | ||
OS: | Linux | ||
Site: | Roche/PHCIX | Slinky Site: | --- |
Alineos Sites: | --- | Atos/Eviden Sites: | --- |
Confidential Site: | --- | Coreweave sites: | --- |
Cray Sites: | --- | DS9 clusters: | --- |
Google sites: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Tzag Elita Sites: | --- |
Linux Distro: | --- | Machine Name: | |
CLE Version: | Version Fixed: | ||
Target Release: | --- | DevPrio: | --- |
Emory-Cloud Sites: | --- | ||
Attachments: | slurm rest apierrors |
Description
Praveen SV
2021-08-03 06:34:21 MDT
(In reply to Praveen SV from comment #0) > we are working on slurm rest api. we are following the api as per this link > [https://app.swaggerhub.com/apis/rherrick/slurm-rest_api/0.0.35#/default/ > get_slurm_v0_0_35_diag_]. That is neat, I have been using [https://editor.swagger.io/] to verify swagger functionality. > 1) Url: /slurm/v0.0.35/jobs > Response: [] > Response Code: 500 Side note: Please use 'v0.0.36' as it is the latest plugin in slurm-20.11. > 2) Url: /slurm/v0.0.36/jobs > Response: { > "meta": { > "plugin": { > "type": "openapi/v0.0.36", > "name": "REST v0.0.36" > }, > "Slurm": { > "version": { > "major": 20, > "micro": 8, > "minor": 11 > }, > "release": "20.11.8" > } > }, > "errors": [], > "jobs": [] > } > Response code: 500 Looks like a bug since return 500 means server error but the errors array isn't populated. Please provide the slurmctld log around the time of the query. Hi , Please find the slurmctld log below. Getting errors [2021-08-04T06:53:07.290] error: slurm_auth_verify: jwt_decode failure [2021-08-04T06:53:07.290] error: slurm_unpack_received_msg: g_slurm_auth_verify: REQUEST_JOB_INFO has authentication error: Unspecified error [2021-08-04T06:53:07.290] error: slurm_unpack_received_msg: Protocol authentication error [2021-08-04T06:53:07.300] error: slurm_receive_msg [10.175.228.7:56586]: Unspecified error [2021-08-04T06:56:08.845] error: slurm_auth_verify: jwt_decode failure [2021-08-04T06:56:08.845] error: slurm_unpack_received_msg: g_slurm_auth_verify: REQUEST_JOB_INFO has authentication error: Unspecified error [2021-08-04T06:56:08.845] error: slurm_unpack_received_msg: Protocol authentication error [2021-08-04T06:56:08.855] error: slurm_receive_msg [10.175.228.7:56806]: Unspecified error [2021-08-04T06:57:05.005] error: slurm_auth_verify: jwt_decode failure [2021-08-04T06:57:05.005] error: slurm_unpack_received_msg: g_slurm_auth_verify: REQUEST_JOB_INFO has authentication error: Unspecified error [2021-08-04T06:57:05.005] error: slurm_unpack_received_msg: Protocol authentication error [2021-08-04T06:57:05.015] error: slurm_receive_msg [10.175.228.7:56878]: Unspecified error (In reply to Praveen SV from comment #2) > [2021-08-04T06:53:07.290] error: slurm_auth_verify: jwt_decode failure The controller (slurmctld) is killing the connection due to the auth failure. In order to provide attackers no extra information, Slurm closes any connection that fails auth immediately on failure. Looks like the lack of an error sent to the client is a duplicate of bug#11442 which has already been resolved upstream. How is the JWT being requested and provided in the request to slurmrestd? (In reply to Praveen SV from comment #4) > Not able to view this bug#11442 Yes, that one is tagged private. All of the patches will be included with slurm-21.08 release: > https://github.com/SchedMD/slurm/compare/28aee570b933cb786638a9c260fb17ea6c2679ec...f8afdea8f589c3417658de37319f881d7069cefa Hi Nate Rini, Still we are not getting any response. We are following this JWT Authentication https://slurm.schedmd.com/jwt.html Regards, Praveen (In reply to Praveen SV from comment #6) > https://slurm.schedmd.com/jwt.html Please provide additional details on how exactly you are getting the JWT auth tokens and how are they being sent. Please also run slurmrestd at a higher debug level (add -vvvvv) to the command arguments to get more verbose logs. Please check the examples here too: > https://slurm.schedmd.com/SLUG20/REST_API.pdf Hi Nate, The authentication worked I made few changes in slurm.conf file. Now for generating token. If there a way we can generate token with indefinite lifespan. Regards, Praveen (In reply to Praveen SV from comment #8) > The authentication worked I made few changes in slurm.conf file. Now for > generating token. If there a way we can generate token with indefinite > lifespan. An expiration is required for Slurm JWTs but the amount of time is given by a signed 32bit integer allowing very long-lived tokens. I suggest just passing token request with a long 'lifespan=' argument if desired. Are there any more questions? reducing severity while waiting on response to comment#9 Hi Nate, We are facing issue while submitting a job using rest api. Please find the below error. "error": "Jobs description entry not found, empty or not dictionary or list" "errors": [ { "error_code": 9002, "error": "Jobs description entry not found, empty or not dictionary or list" } ] What could be possibly wrong ? Best Regards, Praveen (In reply to Praveen SV from comment #11) > Hi Nate, > > We are facing issue while submitting a job using rest api. Please find the > below error. > "error": "Jobs description entry not found, empty or not > dictionary or list" Please provide log from slurmrestd and the job's JSON request? Hi Nate, Unable to locate the slurmrestd logs. Please provide me the path or steps to enable/configure it. PLease find the json request Please find the request and response of job submit api Curl request: curl --location --request POST '/slurm/v0.0.36/job/submit' \ --header 'X-SLURM-USER-TOKEN: #########' \ --header 'X-SLURM-USER-NAME: ##########' \ --header 'Content-Type: application/json' \ --data-raw '{ "name": "test", "partition": "C-16Cpu-30GB", "script": "test.sh", "environment": {} }' Response: { "meta": { "plugin": { "type": "openapi/v0.0.36", "name": "REST v0.0.36" }, "Slurm": { "version": { "major": 20, "micro": 8, "minor": 11 }, "release": "20.11.8" } }, "errors": [ { "error_code": 9002, "error": "Jobs description entry not found, empty or not dictionary or list" } ] } Best Regards, Praveen (In reply to Praveen SV from comment #13) > Unable to locate the slurmrestd logs. Please provide me the path or steps to > enable/configure it. How is slurmrestd being run? via systemd? Hi Nate, Yes - systemd systemctl status slurmrestd ● slurmrestd.service - Slurm REST daemon Loaded: loaded (/lib/systemd/system/slurmrestd.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2021-07-28 12:02:46 UTC; 3 weeks 0 days ago Main PID: 12078 (slurmrestd) Tasks: 21 (limit: 4915) CGroup: /system.slice/slurmrestd.service └─12078 /shared/slurm_SLURM-MASTER-USW2-HPC-SB/sbin/slurmrestd 0.0.0.0:80 Regards, Praveen Please call:
> cat /lib/systemd/system/slurmrestd.service
Hi, root:~# cat /lib/systemd/system/slurmrestd.service [Unit] Description=Slurm REST daemon After=network.target munge.service slurmctld.service ConditionPathExists=/shared/slurm_SLURM-MASTER-USW2-HPC-SB/etc/slurm.conf [Service] Type=simple EnvironmentFile=-/etc/sysconfig/slurmrestd Environment="SLURM_JWT=daemon" ExecStart=/shared/slurm_SLURM-MASTER-USW2-HPC-SB/sbin/slurmrestd $SLURMRESTD_OPTIONS 0.0.0.0:80 #ExecStart=/shared/slurm_SLURM-MASTER-USW2-HPC-SB/sbin/slurmrestd -f /shared/slurm_SLURM-MASTER-USW2-HPC-SB/etc/slurm.conf 0.0.0.0:80 ExecReload=/bin/kill -HUP $MAINPID [Install] WantedBy=multi-user.target Thanks Add this line to /etc/sysconfig/slurmrestd: > SLURMRESTD_OPTIONS="-vvvvv" then call: > systemctl daemon-reload > systemctl restart slurmrestd.service run the test curl request attach the output: > journalctl _SYSTEMD_UNIT=slurmrestd.service Any updates? (In reply to Nate Rini from comment #19) > Any updates? Reducing ticket severity as there has been a reply to comment#18 in more than a week. I'm going to close this ticket as timed out. When convenient, please reply to comment #18 and we can continue debugging. Thanks, --Nate HI Team, Is there a way we can generate jwt token using api ? Regards, Praveen (In reply to Praveen SV from comment #22) > Is there a way we can generate jwt token using API ? Tokens can not currently be generated using the rest API. Sites are always welcome to submit RFE tickets. The tokens used by Slurm are rfc5719 compliant and can be generated via multiple means outside of Slurm. See details here: > https://slurm.schedmd.com/jwt.html#compatibility Are there any more questions? Hi Nate, When we generate jwt token using the command <<scontrol token $username>> and pass it in api call its working fine. But when we generate token using this python block as mentioned [https://slurm.schedmd.com/jwt.html ] the api call throws 5005 error. Python block import sys import os import pprint import json import time from datetime import datetime, timedelta, timezone from jwt import JWT from jwt.jwa import HS256 from jwt.jwk import jwk_from_dict from jwt.utils import b64decode,b64encode if len(sys.argv) != 3: sys.exit("gen_jwt.py [user name] [expiration time (seconds)]"); with open("jwt_hs256.key", "rb") as f: priv_key = f.read() signing_key = jwk_from_dict({ 'kty': 'oct', 'k': b64encode(priv_key) }) message = { "exp": int(time.time() + int(sys.argv[2])), "iat": int(time.time()), "sun": sys.argv[1] } a = JWT() compact_jws = a.encode(message, signing_key, alg='HS256') print("SLURM_JWT={}".format(compact_jws)) ------------------------------------------------------------- when we submit a job using api. We are getting response as { "meta": { "plugin": { "type": "openapi/v0.0.36", "name": "REST v0.0.36" }, "Slurm": { "version": { "major": 20, "micro": 8, "minor": 11 }, "release": "20.11.8" } }, "errors": [ { "error_code": 5005, "error": "Zero Bytes were transmitted or received" } ] } Slurmctld.log [2021-10-13T11:08:53.844] error: slurm_unpack_received_msg: g_slurm_auth_verify: REQUEST_SUBMIT_BATCH_JOB has authentication error: Unspecified error [2021-10-13T11:08:53.844] error: slurm_unpack_received_msg: Protocol authentication error [2021-10-13T11:08:53.854] error: slurm_receive_msg [10.175.228.7:37188]: Unspecified error [2021-10-13T14:45:15.800] error: slurm_auth_verify: jwt_decode failure [2021-10-13T14:45:15.800] error: slurm_unpack_received_msg: g_slurm_auth_verify: REQUEST_SUBMIT_BATCH_JOB has authentication error: Unspecified error [2021-10-13T14:45:15.800] error: slurm_unpack_received_msg: Protocol authentication error [2021-10-13T14:45:15.810] error: slurm_receive_msg [10.175.228.7:49488]: Unspecified error [2021-10-13T14:47:18.281] error: slurm_auth_verify: jwt_decode failure [2021-10-13T14:47:18.281] error: slurm_unpack_received_msg: g_slurm_auth_verify: REQUEST_SUBMIT_BATCH_JOB has authentication error: Unspecified error [2021-10-13T14:47:18.281] error: slurm_unpack_received_msg: Protocol authentication error [2021-10-13T14:47:18.291] error: slurm_receive_msg [10.175.228.7:52812]: Unspecified error Best Regards, Praveen (In reply to Praveen SV from comment #24) > Slurmctld.log > [2021-10-13T14:47:18.281] error: slurm_auth_verify: jwt_decode failure the generated token appears to be incorrect. > with open("jwt_hs256.key", "rb") as f: > priv_key = f.read() Please make sure that the priv_key is correctly getting loaded. The example code doesn't really have any error handling. Hi Nate, They key file is loading correctly. The token is also generated. But using the generated token we are getting the 5005 error. Regards, Praveen (In reply to Praveen SV from comment #24) > [2021-10-13T14:45:15.800] error: slurm_auth_verify: jwt_decode failure Is it possible to get this patch applied and try the auth again? > https://github.com/SchedMD/slurm/commit/ea41752047548d2263e148b6e65a75e2a95f48f3 The 20.11 error is not helpful in determining the why. We can also use gdb if getting patches is difficult for your site. Hi Nate, I'm not able to locate src/plugins/auth/jwt/auth_jwt.c in my system. https://github.com/SchedMD/slurm/commit/ea41752047548d2263e148b6e65a75e2a95f48f3?branch=ea41752047548d2263e148b6e65a75e2a95f48f3&diff=unified Best Regards, Praveen How was Slurm installed? Hi Nate, Please find the steps 1.Download the lastest slurm version https://www.schedmd.com/downloads.php 2.#decompress the slurm source files and compile tar -xvf ${script_dir}/../slurm-${SlurmVersion}.tar.bz2 -C /tmp/ cd /tmp/slurm-${SlurmVersion} /tmp/slurm-${SlurmVersion}/configure --prefix=${slurm_path} --with-libcurl --with-jwt make -j $(nproc) make install slurm_path="/shared/slurm_SLURM-MASTER-USW2-HPC-SB" 3.Restart the services systemctl restart slurmdbd systemctl restart slurmctld systemctl restart slurmrestd Best Regards, Praveen (In reply to Praveen SV from comment #31) > Please find the steps Please swap to this: > 1.Download the lastest slurm version > https://www.schedmd.com/downloads.php > > 2.#decompress the slurm source files and compile > > tar -xvf ${script_dir}/../slurm-${SlurmVersion}.tar.bz2 -C /tmp/ > cd /tmp/slurm-${SlurmVersion} > patch -p1 < $PATH_TO_PATCH Just apply the patch before configure and make is called. > /tmp/slurm-${SlurmVersion}/configure --prefix=${slurm_path} --with-libcurl > --with-jwt > make -j $(nproc) > make install > > slurm_path="/shared/slurm_SLURM-MASTER-USW2-HPC-SB" I suggest applying the patch to a test directory first but it will only take affect when daemons are restarted. Please tell me if that works. Any updates? Please reply when convient. I'm closing this ticket but it will reopen automatically on reply. |