Ticket 9529

Summary: squeue behavior inconsistent
Product: Slurm Reporter: Anthony DelSorbo <anthony.delsorbo>
Component: User CommandsAssignee: Director of Support <support>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 20.02.2   
Hardware: Linux   
OS: Linux   
Site: NOAA Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: NESCC OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Anthony DelSorbo 2020-08-06 12:25:48 MDT
if you call squeue with —jobs and you pass it exactly one jobid that is invalid, it will return an error and exit with status=1.

[Christopher.W.Harrop@Hera:hfe03 ~]$ /apps/slurm/default/bin/squeue --jobs=1 
slurm_load_jobs error: Invalid job id specified
[Christopher.W.Harrop@Hera:hfe03 ~]$ echo $?
1

However, if you call squeue with —jobs and you pass it more than one jobid, it will not report and error, and will return status=0 even if ALL the jobids are invalid.

[Christopher.W.Harrop@Hera:hfe03 ~]$ /apps/slurm/default/bin/squeue --jobs=1,2
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
[Christopher.W.Harrop@Hera:hfe03 ~]$ echo $?
0

This is inconsistent behavior that is confusing for people, such as our developers, who write code to interface with these commands.

Thanks,

Tony.
Comment 2 Colby Ashley 2020-08-06 14:33:13 MDT
Hey Tony,

Just to get some clarification on some things before I dig into the code.

> if you call squeue with —jobs and you pass it exactly one jobid that is
> invalid, it will return an error and exit with status=1.
> 
> [Christopher.W.Harrop@Hera:hfe03 ~]$ /apps/slurm/default/bin/squeue --jobs=1 
> slurm_load_jobs error: Invalid job id specified
> [Christopher.W.Harrop@Hera:hfe03 ~]$ echo $?
> 1
This seems correct to me, shooting out the error message along with the error code of 1.

> However, if you call squeue with —jobs and you pass it more than one jobid,
> it will not report and error, and will return status=0 even if ALL the
> jobids are invalid.
> 
> [Christopher.W.Harrop@Hera:hfe03 ~]$ /apps/slurm/default/bin/squeue
> --jobs=1,2
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON) 
> [Christopher.W.Harrop@Hera:hfe03 ~]$ echo $?
> 0
This does seem odd though, I assume your devs are looking for a return code of 1 here correct? Are they also looking for an error message similar to the previous  example?

~Colby
Comment 3 Anthony DelSorbo 2020-08-06 15:17:09 MDT
(In reply to Colby Ashley from comment #2)

> > 0
> This does seem odd though, I assume your devs are looking for a return code
> of 1 here correct? Are they also looking for an error message similar to the
> previous  example?
> 
> ~Colby

Colby - right, we're in agreement the first example is what we would expect.  But the second leads to confusion - especially since both job ids are invalid (in this case).  I can see an edge case where the job ids have some valid and some invalid - then what do you do?  Perhaps the answer there is to produce both an output for the valid ids and an error message for the invalid ids.

Thanks for the quick response - Tony.
Comment 5 Colby Ashley 2020-10-08 11:24:58 MDT
Update: still looking into this, the error code is being returned in a special way so it will take some time to figure out.
Comment 6 Colby Ashley 2020-10-09 15:16:01 MDT
Hey Tony,

We have the ability to change a few things to print out an error code of 1 when all of the jobids are invalid. Without a major rewrite to squeue we cannot return an error code if some of the jobs are valid and some are not. Is this something you would still like done? You would still have to parse the output of squeue when running with multiple jobids. Though this could save a bit of time when all of the jobids are invalid.

~Colby
Comment 7 Colby Ashley 2020-11-04 13:52:28 MST
Closing reopen if you want the error code changed when all of the jobs are invalid.