Ticket 1561

Summary: squeue --array not working when used with % option to limit number of simultaneous tasks
Product: Slurm Reporter: Will French <will>
Component: OtherAssignee: David Bigagli <david>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: brian, da
Version: 14.11.4   
Hardware: Linux   
OS: Linux   
Site: Vanderbilt Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 14.11.6 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Will French 2015-03-26 01:38:07 MDT
The "squeue --array" does not appear to be working:

[frenchwr@vmps55 scripts]$ squeue --states=pending --user=joe
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
885224_[329-1000%3 productio    blast      joe PD       0:00      1 (JobArrayTaskLimit)
[frenchwr@vmps55 scripts]$ squeue --states=pending --user=joe --array
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[frenchwr@vmps55 scripts]$ squeue --states=pending --user=joe -r
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[frenchwr@vmps55 scripts]$ squeue --array | grep PD | grep joe
[frenchwr@vmps55 scripts]$ squeue | grep PD | grep joe
885224_[329-1000%3 productio    blast joe PD       0:00      1 (JobArrayTaskLimit)

The squeue man page and this page: http://slurm.schedmd.com/job_array.html both indicate that the --array option should display one job array element per line for pending jobs. 

For simpler tests the option works fine:

[frenchwr@vmps55 example-4]$ squeue --user=frenchwr --array
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          943668_0 productio array.sl frenchwr PD       0:00      1 (None)
          943668_1 productio array.sl frenchwr PD       0:00      1 (None)
          943668_2 productio array.sl frenchwr PD       0:00      1 (None)

It also works if I use larger indices:

[frenchwr@vmps55 example-4]$ squeue --user=frenchwr 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
  943684_[100-102] productio array.sl frenchwr PD       0:00      1 (None)
[frenchwr@vmps55 example-4]$ squeue --user=frenchwr --array
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
        943684_100 productio array.sl frenchwr PD       0:00      1 (None)
        943684_101 productio array.sl frenchwr PD       0:00      1 (None)
        943684_102 productio array.sl frenchwr PD       0:00      1 (None)


However, if I limit the array to running on two cores at once with something like "#SBATCH --array=0-2%2" then the --array option stops working again (in this case showing the incorrect number of job array elements):

[frenchwr@vmps55 example-4]$ squeue --user=frenchwr 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    943689_[0-2%2] productio array.sl frenchwr PD       0:00      1 (None)
[frenchwr@vmps55 example-4]$ squeue --user=frenchwr --array
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          943689_0 productio array.sl frenchwr PD       0:00      1 (None)
          943689_1 productio array.sl frenchwr PD       0:00      1 (None)
          943689_2 productio array.sl frenchwr PD       0:00      1 (None)
          943689_3 productio array.sl frenchwr PD       0:00      1 (None)
          943689_4 productio array.sl frenchwr PD       0:00      1 (None)
          943689_5 productio array.sl frenchwr PD       0:00      1 (None)
          943689_6 productio array.sl frenchwr PD       0:00      1 (None)
          943689_7 productio array.sl frenchwr PD       0:00      1 (None)
          943689_8 productio array.sl frenchwr PD       0:00      1 (None)
          943689_9 productio array.sl frenchwr PD       0:00      1 (None)
         943689_10 productio array.sl frenchwr PD       0:00      1 (None)
         943689_11 productio array.sl frenchwr PD       0:00      1 (None)
         943689_12 productio array.sl frenchwr PD       0:00      1 (None)
         943689_13 productio array.sl frenchwr PD       0:00      1 (None)
         943689_14 productio array.sl frenchwr PD       0:00      1 (None)
         943689_15 productio array.sl frenchwr PD       0:00      1 (None)
         943689_16 productio array.sl frenchwr PD       0:00      1 (None)
         943689_17 productio array.sl frenchwr PD       0:00      1 (None)
         943689_18 productio array.sl frenchwr PD       0:00      1 (None)
         943689_19 productio array.sl frenchwr PD       0:00      1 (None)
         943689_20 productio array.sl frenchwr PD       0:00      1 (None)
         943689_21 productio array.sl frenchwr PD       0:00      1 (None)
         943689_22 productio array.sl frenchwr PD       0:00      1 (None)


If I use larger indices then the --array option causes the jobs to disappear altogether, as in the original example:

[frenchwr@vmps55 example-4]$ squeue --user=frenchwr 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
943696_[100-102%2] productio array.sl frenchwr PD       0:00      1 (None)
[frenchwr@vmps55 example-4]$ squeue --user=frenchwr --array
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[frenchwr@vmps55 example-4]$ squeue --user=frenchwr
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
943696_[100-102%2] productio array.sl frenchwr PD       0:00      1 (None)
Comment 1 David Bigagli 2015-03-26 05:34:27 MDT
I can reproduce your problem. Using --array=1-4%2 squeue --array reports
42 array elements instead of 4. It looks like the % is misinterpreted somewhere.
We let you know as soon as we fix it.

David
Comment 2 Moe Jette 2015-03-26 05:40:06 MDT
(In reply to David Bigagli from comment #1)
> I can reproduce your problem. Using --array=1-4%2 squeue --array reports
> 42 array elements instead of 4. It looks like the % is misinterpreted
> somewhere.
> We let you know as soon as we fix it.
> 
> David

Note the "%" syntax was added in Slurm version 14.11.
Comment 3 David Bigagli 2015-03-26 09:06:55 MDT
Fixed in commit ee0651b0f7. Available in 14.11.6. The fix could be easily back 
ported.

Thanks,
        David
Comment 4 Will French 2015-03-26 09:10:56 MDT
(In reply to David Bigagli from comment #3)
> Fixed in commit ee0651b0f7. Available in 14.11.6. The fix could be easily
> back 
> ported.
> 
> Thanks,
>         David

Thanks, David! I will be sure to test this in when 14.11.6 is released. 

Will