Ticket 15988

Summary: scontrol show job <job_id> shows error: invalid job id specified, while sacct is able to query
Product: Slurm Reporter: William Durairaj <william.durairaj.s>
Component: AccountingAssignee: Jacob Jenson <jacob>
Status: OPEN --- QA Contact:
Severity: 6 - No support contract    
Priority: --- CC: william.durairaj.s
Version: 21.08.8   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: Logs from controller, slurmdbd and slurmd

Description William Durairaj 2023-02-09 02:22:12 MST
Created attachment 28769 [details]
Logs from controller, slurmdbd and slurmd

headnodevm:/var/log/slurm # scontrol show job 3050
slurm_load_jobs error: Invalid job id specified
dkrdc01-headnodevm:/var/log/slurm # sacct -j 3050
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
3050         Mechanical      rack1                   128 CANCELLED+      0:0
3050.batch        batch                               64  CANCELLED     0:15
3050.extern      extern                              128  COMPLETED      0:0

headnodevm:/var/log/slurm #

headnodevm:/var/log/slurm # slurmctld -V
slurm 21.08.8

/var/log/slurm/slurmd.log on the node where the job ran (dkrdc01-computeserver001) has only this debug2 message


_insert_job_state: we already have a job state for job 3050.  No big deal, just an FYI.
Comment 1 William Durairaj 2023-02-09 02:22:45 MST
this problem happens randomly on the job submission.
Version of slurm : 21.08.8