Ticket 6755 - Sacct different behaviour after update
Summary: Sacct different behaviour after update
Status: RESOLVED DUPLICATE of ticket 6697
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 18.08.6
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Albert Gil
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-03-26 03:09 MDT by Ahmed Essam ElMazaty
Modified: 2019-04-11 02:02 MDT (History)
2 users (show)

See Also:
Site: KAUST
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurmdbd.log (2.14 MB, application/xz)
2019-03-28 00:33 MDT, Ahmed Essam ElMazaty
Details
slurmctld.log (5.66 MB, text/x-log)
2019-03-28 00:35 MDT, Ahmed Essam ElMazaty
Details
slurm.conf (5.03 KB, text/plain)
2019-03-28 00:36 MDT, Ahmed Essam ElMazaty
Details
slurmdbd.conf (10.57 KB, application/vnd.oasis.opendocument.text)
2019-03-28 00:37 MDT, Ahmed Essam ElMazaty
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Ahmed Essam ElMazaty 2019-03-26 03:09:22 MDT
Good afternoon,
After upgrading slurm to slurm 18.08.6-2, 'sacct' command seems to behave in a different way.
previously 'sacct -j <job ID>' displays immediately info about the job. but now it doesn't
i.e 
# sacct -j 1668219
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
#

now it does not work until I specify start time with '-S' option 
#  sacct -j 1668219 -S 2019-03-01
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
1668219        gsvSlow4      batch    default         12  COMPLETED      0:0 
1668219.bat+      batch               default         12  COMPLETED      0:0 
1668219.ext+     extern               default         12  COMPLETED      0:0
Comment 1 Albert Gil 2019-03-26 09:56:48 MDT
Hi Ahmed,

Yes, this looks like a regression of the work done bug 5717.
Let me check it further and I'll let you know.

Albert
Comment 2 Albert Gil 2019-03-27 04:39:28 MDT
Hi Ahmed,

I've not being able to replicate your issue.
Please could you post the following information:
- Your slurm.conf
- Your slurmdb.conf (without any passwd)

Also, could you please change your slurmdb.conf to add these values?
DebugLevel=debug2
DebugFlags=DB_QUERY,DB_JOB,DB_STEP

Then, after restart the slurmdbd, could you run the same commands you did but with "-vvv" and post the logs of slurmdbd and slurmctld?

$ sacct -j 1668219 -vvv
$ sacct -j 1668219 -vvv -S 2019-03-01

And finally, is this happening for any jobid?
Could you try same commands but for jobs that:
- are running or completed today (same day of the command execution)
- were run and completed yesterday
- were run and completed before your update to 18.08.6

The logs slurmdbd and slurmctld while executing all the above commands will be very useful.

Thanks,
Albert
Comment 3 Ahmed Essam ElMazaty 2019-03-28 00:33:07 MDT
Created attachment 9721 [details]
slurmdbd.log
Comment 4 Ahmed Essam ElMazaty 2019-03-28 00:34:28 MDT
(In reply to Albert Gil from comment #2)

Hello Albert,
Please find my comments inline 

> Hi Ahmed,
> 
> I've not being able to replicate your issue.
> Please could you post the following information:
> - Your slurm.conf
attached

> - Your slurmdb.conf (without any passwd)
attached
> 
> Also, could you please change your slurmdb.conf to add these values?
> DebugLevel=debug2
> DebugFlags=DB_QUERY,DB_JOB,DB_STEP

changed and restarted

> 
> Then, after restart the slurmdbd, could you run the same commands you did
> but with "-vvv" and post the logs of slurmdbd and slurmctld?
> 
> $ sacct -j 1668219 -vvv
Here's the output, and attached the logs

# sacct -j 1668219 -vvv
sacct: Jobs Eligible in the time window from Epoch 0 to Thu Mar 28 09:06:57 2019
sacct: debug:  Options selected:
	opt_completion=0
	opt_dup=0
	opt_field_list=(null)
	opt_help=0
	opt_no_steps=0
	opt_whole_hetjob=(null)
sacct: Accounting storage SLURMDBD plugin loaded
sacct: debug:  Munge authentication plugin loaded
sacct: debug:  slurmdbd: Sent PersistInit msg
sacct: debug2: Clusters requested:	dragon
sacct: debug2: Userids requested:	all
sacct: debug2: Jobs requested:
sacct: debug2: 	: 1668219
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
sacct: debug:  slurmdbd: Sent fini msg


> $ sacct -j 1668219 -vvv -S 2019-03-01
Here is the output and attached the logs

# sacct -j 1668219 -vvv -S 2019-03-01
sacct: Jobs Eligible in the time window from Fri Mar 01 00:00:00 2019 to Thu Mar 28 09:07:40 2019
sacct: debug:  Options selected:
	opt_completion=0
	opt_dup=0
	opt_field_list=(null)
	opt_help=0
	opt_no_steps=0
	opt_whole_hetjob=(null)
sacct: Accounting storage SLURMDBD plugin loaded
sacct: debug:  Munge authentication plugin loaded
sacct: debug:  slurmdbd: Sent PersistInit msg
sacct: debug2: Clusters requested:	dragon
sacct: debug2: Userids requested:	all
sacct: debug2: Jobs requested:
sacct: debug2: 	: 1668219
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
1668219        gsvSlow4      batch    default         12  COMPLETED      0:0 
1668219.bat+      batch               default         12  COMPLETED      0:0 
1668219.ext+     extern               default         12  COMPLETED      0:0 
sacct: debug:  slurmdbd: Sent fini msg



> 
> And finally, is this happening for any jobid?
> Could you try same commands but for jobs that:
> - are running or completed today (same day of the command execution)
it works for running jobs 

# sacct -j 1650895
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
1650895          MkSmby      batch    default         32    RUNNING      0:0 
1650895.ext+     extern               default         32    RUNNING      0:0 

and it works for jobs that were completed today (this one ended 5 hours ago)

# sacct -j 1799897
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
1799897      W05H30d0r+      batch    default         20  COMPLETED      0:0 
1799897.bat+      batch               default         20  COMPLETED      0:0 
1799897.ext+     extern               default         20  COMPLETED      0:0 


> - were run and completed yesterday
it doesn't

# sacct -j 1750453
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 


> - were run and completed before your update to 18.08.6
no it doesn't show anything without -S

> 
> The logs slurmdbd and slurmctld while executing all the above commands will
> be very useful.
attached

> 
> Thanks,
> Albert

Thanks for your help.
Ahmed
Comment 5 Ahmed Essam ElMazaty 2019-03-28 00:35:05 MDT
Created attachment 9722 [details]
slurmctld.log
Comment 6 Ahmed Essam ElMazaty 2019-03-28 00:36:38 MDT
Created attachment 9723 [details]
slurm.conf
Comment 7 Ahmed Essam ElMazaty 2019-03-28 00:37:18 MDT
Created attachment 9724 [details]
slurmdbd.conf
Comment 8 Albert Gil 2019-03-28 12:42:26 MDT
Sorry Ahmed,
I don't know how I didn't realized before.
This bug has been already reported and fixed in bug 6697.

Albert

*** This ticket has been marked as a duplicate of ticket 6697 ***
Comment 9 Albert Gil 2019-04-11 02:02:59 MDT
*** Ticket 6830 has been marked as a duplicate of this ticket. ***