Ticket 15988 - scontrol show job <job_id> shows error: invalid job id specified, while sacct is able to query
Summary: scontrol show job <job_id> shows error: invalid job id specified, while sacct...
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 21.08.8
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-02-09 02:22 MST by William Durairaj
Modified: 2023-02-09 02:22 MST (History)
1 user (show)

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
Logs from controller, slurmdbd and slurmd (12.61 KB, application/x-zip-compressed)
2023-02-09 02:22 MST, William Durairaj
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description William Durairaj 2023-02-09 02:22:12 MST
Created attachment 28769 [details]
Logs from controller, slurmdbd and slurmd

headnodevm:/var/log/slurm # scontrol show job 3050
slurm_load_jobs error: Invalid job id specified
dkrdc01-headnodevm:/var/log/slurm # sacct -j 3050
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
3050         Mechanical      rack1                   128 CANCELLED+      0:0
3050.batch        batch                               64  CANCELLED     0:15
3050.extern      extern                              128  COMPLETED      0:0

headnodevm:/var/log/slurm #

headnodevm:/var/log/slurm # slurmctld -V
slurm 21.08.8

/var/log/slurm/slurmd.log on the node where the job ran (dkrdc01-computeserver001) has only this debug2 message


_insert_job_state: we already have a job state for job 3050.  No big deal, just an FYI.
Comment 1 William Durairaj 2023-02-09 02:22:45 MST
this problem happens randomly on the job submission.
Version of slurm : 21.08.8