Ticket 9529 - squeue behavior inconsistent
Summary: squeue behavior inconsistent
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 20.02.2
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Director of Support
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-08-06 12:25 MDT by Anthony DelSorbo
Modified: 2020-11-04 13:52 MST (History)
0 users

See Also:
Site: NOAA
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: NESCC
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Anthony DelSorbo 2020-08-06 12:25:48 MDT
if you call squeue with —jobs and you pass it exactly one jobid that is invalid, it will return an error and exit with status=1.

[Christopher.W.Harrop@Hera:hfe03 ~]$ /apps/slurm/default/bin/squeue --jobs=1 
slurm_load_jobs error: Invalid job id specified
[Christopher.W.Harrop@Hera:hfe03 ~]$ echo $?
1

However, if you call squeue with —jobs and you pass it more than one jobid, it will not report and error, and will return status=0 even if ALL the jobids are invalid.

[Christopher.W.Harrop@Hera:hfe03 ~]$ /apps/slurm/default/bin/squeue --jobs=1,2
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
[Christopher.W.Harrop@Hera:hfe03 ~]$ echo $?
0

This is inconsistent behavior that is confusing for people, such as our developers, who write code to interface with these commands.

Thanks,

Tony.
Comment 2 Colby Ashley 2020-08-06 14:33:13 MDT
Hey Tony,

Just to get some clarification on some things before I dig into the code.

> if you call squeue with —jobs and you pass it exactly one jobid that is
> invalid, it will return an error and exit with status=1.
> 
> [Christopher.W.Harrop@Hera:hfe03 ~]$ /apps/slurm/default/bin/squeue --jobs=1 
> slurm_load_jobs error: Invalid job id specified
> [Christopher.W.Harrop@Hera:hfe03 ~]$ echo $?
> 1
This seems correct to me, shooting out the error message along with the error code of 1.

> However, if you call squeue with —jobs and you pass it more than one jobid,
> it will not report and error, and will return status=0 even if ALL the
> jobids are invalid.
> 
> [Christopher.W.Harrop@Hera:hfe03 ~]$ /apps/slurm/default/bin/squeue
> --jobs=1,2
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON) 
> [Christopher.W.Harrop@Hera:hfe03 ~]$ echo $?
> 0
This does seem odd though, I assume your devs are looking for a return code of 1 here correct? Are they also looking for an error message similar to the previous  example?

~Colby
Comment 3 Anthony DelSorbo 2020-08-06 15:17:09 MDT
(In reply to Colby Ashley from comment #2)

> > 0
> This does seem odd though, I assume your devs are looking for a return code
> of 1 here correct? Are they also looking for an error message similar to the
> previous  example?
> 
> ~Colby

Colby - right, we're in agreement the first example is what we would expect.  But the second leads to confusion - especially since both job ids are invalid (in this case).  I can see an edge case where the job ids have some valid and some invalid - then what do you do?  Perhaps the answer there is to produce both an output for the valid ids and an error message for the invalid ids.

Thanks for the quick response - Tony.
Comment 5 Colby Ashley 2020-10-08 11:24:58 MDT
Update: still looking into this, the error code is being returned in a special way so it will take some time to figure out.
Comment 6 Colby Ashley 2020-10-09 15:16:01 MDT
Hey Tony,

We have the ability to change a few things to print out an error code of 1 when all of the jobids are invalid. Without a major rewrite to squeue we cannot return an error code if some of the jobs are valid and some are not. Is this something you would still like done? You would still have to parse the output of squeue when running with multiple jobids. Though this could save a bit of time when all of the jobids are invalid.

~Colby
Comment 7 Colby Ashley 2020-11-04 13:52:28 MST
Closing reopen if you want the error code changed when all of the jobs are invalid.