Ticket 21641 - sacct -j <jobid> --expand-patterns not returning the correct value for stdout of array jobs
Summary: sacct -j <jobid> --expand-patterns not returning the correct value for stdout...
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 24.05.3
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2024-12-13 09:18 MST by James Owers-Bardsley
Modified: 2024-12-13 09:30 MST (History)
0 users

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description James Owers-Bardsley 2024-12-13 09:18:12 MST
If I use sacct to get the stdout filepath for an array job, it's replacing %A incorrectly. It's replacing it with JobIDRaw, but it should be replacing it with the [Job array's master job allocation number](https://slurm.schedmd.com/sbatch.html#OPT_%A).

For example, getting the un expanded output for a job with id 40275:

```shell
$ sacct -j 40275 --allocations --format 'JobID%32,JobIDRaw,StdOut%-128' --expand-patterns
                           JobID JobIDRaw     StdOut                                                                                                                           
-------------------------------- ------------ -------------------------------------------------------------------------------------------------------------------------------- 
                         40275_1 40276        /home/username/path/to/%a/zz_logs.eval.%A.log                      
                         40275_2 40277        /home/username/path/to/%a/zz_logs.eval.%A.log                      
                         40275_3 40278        /home/username/path/to/%a/zz_logs.eval.%A.log                      
                  40275_[4-12%3] 40275        /home/username/path/to/%a/zz_logs.eval.%A.log 
```

But if I expand the patterns, I get:

```shell
$ sacct -j 40275 --allocations --format 'JobID%32,JobIDRaw,StdOut%-128' --expand-patterns
                           JobID JobIDRaw     StdOut                                                                                                                           
-------------------------------- ------------ -------------------------------------------------------------------------------------------------------------------------------- 
                         40275_1 40276        /home/username/path/to/1/zz_logs.eval.40276.log                      
                         40275_2 40277        /home/username/path/to/2/zz_logs.eval.40277.log                      
                         40275_3 40278        /home/username/path/to/3/zz_logs.eval.40278.log                      
                  40275_[4-12%3] 40275        /home/username/path/to/4294967294/zz_logs.eval.40275.log
```

Issues:
- %A has expanded to JobIDRaw. It should be 40275 for all jobs.
- %a has expanded to 4294967294 for 40275_[4-12%3]. I don't know where this has come from


The output I would have expected would be:

```shell
$ sacct -j 40275 --allocations --format 'JobID%32,JobIDRaw,StdOut%-128' --expand-patterns
                           JobID JobIDRaw     StdOut                                                                                                                           
-------------------------------- ------------ -------------------------------------------------------------------------------------------------------------------------------- 
                         40275_1 40276        /home/username/path/to/1/zz_logs.eval.40275.log                      
                         40275_2 40277        /home/username/path/to/2/zz_logs.eval.40275.log                      
                         40275_3 40278        /home/username/path/to/3/zz_logs.eval.40275.log                      
                  40275_[4-12%3] 40275        /home/username/path/to/%a/zz_logs.eval.40275.log
```

i.e. running/completed jobs have %A expanded to the jobid 40275, and the pending jobs 40275_[4-12%3] don't have %a expanded (or have a stdout of None, or are excluded, any of these would be fine).
Comment 1 James Owers-Bardsley 2024-12-13 09:20:40 MST
EDIT: the first example (un expanded), should not include `--expand-patterns` in the call, i.e. should read:

```shell
$ sacct -j 40275 --allocations --format 'JobID%32,JobIDRaw,StdOut%-128'
                           JobID JobIDRaw     StdOut                                                                                                                           
-------------------------------- ------------ -------------------------------------------------------------------------------------------------------------------------------- 
                         40275_1 40276        /home/username/path/to/%a/zz_logs.eval.%A.log                      
                         40275_2 40277        /home/username/path/to/%a/zz_logs.eval.%A.log                      
                         40275_3 40278        /home/username/path/to/%a/zz_logs.eval.%A.log                      
                  40275_[4-12%3] 40275        /home/username/path/to/%a/zz_logs.eval.%A.log 
```

(apologies, had to anon the paths manually, I have run these commands as is).
Comment 2 James Owers-Bardsley 2024-12-13 09:30:24 MST
One potentially useful additon: the issue manifests like this if `--array` is used:

```
$ sacct -j 40275 --array --allocations --format 'JobID%32,JobIDRaw,StdOut%-128' --expand-patterns
                           JobID JobIDRaw     StdOut                                                                                                                           
-------------------------------- ------------ -------------------------------------------------------------------------------------------------------------------------------- 
                         40275_1 40276        /home/username/path/to/1/zz_logs.eval.40276.log                    
                         40275_2 40277        /home/username/path/to/2/zz_logs.eval.40277.log                    
                         40275_3 40278        /home/username/path/to/3/zz_logs.eval.40278.log                    
                         40275_4 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                         40275_5 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                         40275_6 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                         40275_7 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                         40275_8 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                         40275_9 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                        40275_10 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                        40275_11 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                        40275_12 40275        /home/username/path/to/4/zz_logs.eval.40275.log
```

i.e.

Issues:
- for running/completed jobs: %a expands correctly, but %A has expanded to JobIDRaw. It should be 40275 for all jobs.
- for pending jobs: %A has expanded correctly, but %a has expanded to 4 (the first pending task).

The only correct path is for task 40275_4: /home/username/path/to/4/zz_logs.eval.40275.log is correct (but doesn't actually exist yet). But as soon as that job starts running, the output will change to be incorrect (/home/username/path/to/4/zz_logs.eval.40279.log).
Comment 3 James Owers-Bardsley 2024-12-13 09:30:38 MST
One potentially useful additon: the issue manifests like this if `--array` is used:

```
$ sacct -j 40275 --array --allocations --format 'JobID%32,JobIDRaw,StdOut%-128' --expand-patterns
                           JobID JobIDRaw     StdOut                                                                                                                           
-------------------------------- ------------ -------------------------------------------------------------------------------------------------------------------------------- 
                         40275_1 40276        /home/username/path/to/1/zz_logs.eval.40276.log                    
                         40275_2 40277        /home/username/path/to/2/zz_logs.eval.40277.log                    
                         40275_3 40278        /home/username/path/to/3/zz_logs.eval.40278.log                    
                         40275_4 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                         40275_5 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                         40275_6 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                         40275_7 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                         40275_8 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                         40275_9 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                        40275_10 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                        40275_11 40275        /home/username/path/to/4/zz_logs.eval.40275.log                    
                        40275_12 40275        /home/username/path/to/4/zz_logs.eval.40275.log
```

i.e.

Issues:
- for running/completed jobs: %a expands correctly, but %A has expanded to JobIDRaw. It should be 40275 for all jobs.
- for pending jobs: %A has expanded correctly, but %a has expanded to 4 (the first pending task).

The only correct path is for task 40275_4: /home/username/path/to/4/zz_logs.eval.40275.log is correct (but doesn't actually exist yet). But as soon as that job starts running, the output will change to be incorrect (/home/username/path/to/4/zz_logs.eval.40279.log).