Ticket 3038 - Bug in squeue for arrays with non-1 indexing
Summary: Bug in squeue for arrays with non-1 indexing
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 14.11.10
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Dominik Bartkiewicz
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2016-09-01 07:45 MDT by Stuart Midgley
Modified: 2016-10-13 04:16 MDT (History)
3 users (show)

See Also:
Site: DownUnder GeoSolutions
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 16.05.1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Stuart Midgley 2016-09-01 07:45:24 MDT
Evening

I have built a nice little cgi script which helps users monitor their jobs.  We have noticed that when an array has a non-1 index, squeue returns a stupid number of array tasks...

> tab=" "                                                                                                                                                                                                       

> squeue -j 21075999                                                                                                                                                                                            PARTITION   PRIORITY   NAME                     USER ST       TIME TIME_LEFT  NODES NODELIST(REASON JOBID                                                                                                                              XXXXXXXXXXX 200        zzzzzzzzzzzzzzzzzzzz yyyyyyyy PD       0:00     21:00      1 (Resources)     21075999_[22655-22999:8]                                                                                                           XXXXXXXXXXX 200        zzzzzzzzzzzzzzzzzzzz yyyyyyyy  R      13:14      7:46      1 node1           21075999_22639
XXXXXXXXXXX 200        zzzzzzzzzzzzzzzzzzzz yyyyyyyy  R       8:00     13:00      1 node2           21075999_22647

> squeue -h -r -a -j 21075999 -o "%t${tab}%F${tab}%K${tab}%P${tab}%Q${tab}%u${tab}%j${tab}%k${tab}%Z${tab}%M" | wc -l
207346


and if I look at the output of the command, we have array tasks larger than 22999!

PD 21075999 22996 XXXXXXXXXXX 200 yyyyyyyy zzzzzzzzzzzzzzzzzzzz aaaa/bbbb/workflow.job aaaa/bbbb/ 0:00                                                                                                                                                                                                        PD 21075999 22997 XXXXXXXXXXX 200 yyyyyyyy zzzzzzzzzzzzzzzzzzzz aaaa/bbbb/workflow.job aaaa/bbbb/ 0:00                                                                                                                                                                                                        PD 21075999 22998 XXXXXXXXXXX 200 yyyyyyyy zzzzzzzzzzzzzzzzzzzz aaaa/bbbb/workflow.job aaaa/bbbb/ 0:00                                                                                                                                                                                                        PD 21075999 22999 XXXXXXXXXXX 200 yyyyyyyy zzzzzzzzzzzzzzzzzzzz aaaa/bbbb/workflow.job aaaa/bbbb/ 0:00                                                                                                                                                                                                        PD 21075999 23000 XXXXXXXXXXX 200 yyyyyyyy zzzzzzzzzzzzzzzzzzzz aaaa/bbbb/workflow.job aaaa/bbbb/ 0:00                                                                                                                                                                                                        PD 21075999 23001 XXXXXXXXXXX 200 yyyyyyyy zzzzzzzzzzzzzzzzzzzz aaaa/bbbb/workflow.job aaaa/bbbb/ 0:00                                                                                                                                                                                                        PD 21075999 23002 XXXXXXXXXXX 200 yyyyyyyy zzzzzzzzzzzzzzzzzzzz aaaa/bbbb/workflow.job aaaa/bbbb/ 0:00                                      



(sorry for anonymising it)
Comment 1 Dominik Bartkiewicz 2016-09-01 08:13:47 MDT
Hi,

This looks like old solved issues,
commit cdf2587850 fixes this problem.
The patch is available as https://github.com/SchedMD/slurm/commit/cdf258785034f825234b325.patch
You can manually apply this patch.

Dominik
Comment 2 Stuart Midgley 2016-09-01 17:27:17 MDT
Sigh, thanks.  I guess we really should get upgraded to a more recent version :)


(In reply to Dominik Bartkiewicz from comment #1)
> Hi,
> 
> This looks like old solved issues,
> commit cdf2587850 fixes this problem.
> The patch is available as
> https://github.com/SchedMD/slurm/commit/cdf258785034f825234b325.patch
> You can manually apply this patch.
> 
> Dominik
Comment 3 Dominik Bartkiewicz 2016-10-13 04:16:23 MDT
Marking resolved, please reopen if there were any further questions.

Dominik