Ticket 16707 - Account record for jobs cancelled before starting have start time of 2106-02-07T09:28:14
Summary: Account record for jobs cancelled before starting have start time of 2106-02-...
Status: RESOLVED CANNOTREPRODUCE
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 22.05.7
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Director of Support
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-05-11 13:29 MDT by Greg Wickham
Modified: 2023-05-22 10:15 MDT (History)
0 users

See Also:
Site: KAUST
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Greg Wickham 2023-05-11 13:29:46 MDT
As in the subject.

    $ salloc --gres gpu:8 --time 10-00:00:00 
    salloc: Pending job allocation 25587289
    salloc: job 25587289 queued and waiting for resources

and then in another shell:

    $ scancel 25587289

original shell:

    salloc: Job has been cancelled
    salloc: Job allocation 25587289 has been revoked.
    salloc: error: Job submit/allocate failed: Job/step already completing or completed

And then displaying the accounting record:

sacct -P -j 25587289 --format=jobid,start,end,state
JobID|Start|End|State
25587289|2106-02-07T09:28:14|2023-05-11T22:27:13|CANCELLED by XXXXXX


  -Greg
Comment 1 Caden Ellis 2023-05-19 16:22:29 MDT
This is very strange. I could not reproduce. 

Can you attach your slurmctld.log and slurm.conf?

Caden
Comment 2 Greg Wickham 2023-05-21 13:32:48 MDT
Hi Caden,

We've just updated to 23.02.2 and this issue doesn't appear to exist anymore:

$ salloc --time 00:10:00 -n 1000 --gres gpu:1
salloc: Pending job allocation 25707780
salloc: job 25707780 queued and waiting for resources
salloc: Job allocation 25707780 has been revoked.
salloc: Job has been cancelled
salloc: error: Job submit/allocate failed: Job/step already completing or completed


$ sacct -j 25707780 --format=start,end,state
              Start                 End      State 
------------------- ------------------- ---------- 
               None 2023-05-21T22:30:34 CANCELLED+ 



We're unable to assist anymore as we're no longer running 22.05.7

Please close the ticket if appropriate.

   -Greg
Comment 3 Greg Wickham 2023-05-21 21:17:56 MDT
Hi Caden,

Just following up, this is the same command that demonstrated when I opened the ticket but now with 23.02.2

$ sacct -P -j 25587289 --format=jobid,start,end,state
JobID|Start|End|State
25587289|None|2023-05-11T22:27:13|CANCELLED by 100302

Now the start time is show correctly (None).

So my guess is it was an issue with 22.05.X(?) incorrectly rendering the start time.

   -Greg
Comment 4 Caden Ellis 2023-05-22 10:15:37 MDT
I still got the "None" start time with 22.05.7. Since me and one other couldn't reproduce, we will go ahead and close this.

Caden Ellis