| Summary: | Account record for jobs cancelled before starting have start time of 2106-02-07T09:28:14 | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Greg Wickham <greg.wickham> |
| Component: | Accounting | Assignee: | Director of Support <support> |
| Status: | RESOLVED CANNOTREPRODUCE | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 22.05.7 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | KAUST | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
This is very strange. I could not reproduce. Can you attach your slurmctld.log and slurm.conf? Caden Hi Caden,
We've just updated to 23.02.2 and this issue doesn't appear to exist anymore:
$ salloc --time 00:10:00 -n 1000 --gres gpu:1
salloc: Pending job allocation 25707780
salloc: job 25707780 queued and waiting for resources
salloc: Job allocation 25707780 has been revoked.
salloc: Job has been cancelled
salloc: error: Job submit/allocate failed: Job/step already completing or completed
$ sacct -j 25707780 --format=start,end,state
Start End State
------------------- ------------------- ----------
None 2023-05-21T22:30:34 CANCELLED+
We're unable to assist anymore as we're no longer running 22.05.7
Please close the ticket if appropriate.
-Greg
Hi Caden, Just following up, this is the same command that demonstrated when I opened the ticket but now with 23.02.2 $ sacct -P -j 25587289 --format=jobid,start,end,state JobID|Start|End|State 25587289|None|2023-05-11T22:27:13|CANCELLED by 100302 Now the start time is show correctly (None). So my guess is it was an issue with 22.05.X(?) incorrectly rendering the start time. -Greg I still got the "None" start time with 22.05.7. Since me and one other couldn't reproduce, we will go ahead and close this. Caden Ellis |
As in the subject. $ salloc --gres gpu:8 --time 10-00:00:00 salloc: Pending job allocation 25587289 salloc: job 25587289 queued and waiting for resources and then in another shell: $ scancel 25587289 original shell: salloc: Job has been cancelled salloc: Job allocation 25587289 has been revoked. salloc: error: Job submit/allocate failed: Job/step already completing or completed And then displaying the accounting record: sacct -P -j 25587289 --format=jobid,start,end,state JobID|Start|End|State 25587289|2106-02-07T09:28:14|2023-05-11T22:27:13|CANCELLED by XXXXXX -Greg