Ticket 371

Summary: sacct gives wrong times using accounting_storage/filetxt
Product: Slurm Reporter: Ulf Markwardt <Ulf.markwardt>
Component: AccountingAssignee: Danny Auble <da>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 2.6.x   
Hardware: Linux   
OS: Linux   
Site: Universitat Dresden (Germany) Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: slurm.conf
/home/mark/slurmgit/bin/sacct -X --format "JobName,JobID,Submit,Start, End,State" -S 2013-07-17T08:00:00 --name git_5 > git_5.sacct
/tmp/slurmgit_JobCompLoc.log

Description Ulf Markwardt 2013-07-17 04:18:50 MDT
Created attachment 342 [details]
slurm.conf

Dear Slurm support,

with this morning's checkout I have run the next large scenario. With this I detected mysetrious discrepancies between the JobCompLoc file and the output of sacct. For example:

taurusi1035 /tmp grep "JobId=4623" /tmp/slurmgit_JobCompLoc.log
JobId=4623 UserId=mark(19423) GroupId=swtest(50147) Name=git_5 JobState=COMPLETED Partition=all TimeLimit=10 StartTime=2013-07-17T17:40:41 EndTime=2013-07-17T17:43:04 NodeList=taurusi1151 NodeCnt=1 ProcCnt=1 WorkDir=/scratch/mark/44 

and

/home/mark/slurmgit/bin/sacct -X  --format "JobName,JobID,Submit,Start, End,State"  -S 2013-07-17T08:00:00 |grep " 4623"
git_5 4632         2013-07-17T17:39:00 2013-07-17T17:38:19 2013-07-17T17:40:41  COMPLETED

(The clock difference - tested with "clush -bw taurusi[3001-3180],taurusi[1001-1270] date" - is max a second.)

--- 
My major problem with the sacct output is that the submit time is AFTER the start time. 

Please fix this bug.

Thank you
Ulf

---

After how many bug reports will our site be listed in the drop down field :-)
Comment 1 Ulf Markwardt 2013-07-17 04:21:14 MDT
Created attachment 343 [details]
/home/mark/slurmgit/bin/sacct -X  --format "JobName,JobID,Submit,Start, End,State"  -S 2013-07-17T08:00:00 --name git_5 > git_5.sacct
Comment 2 Ulf Markwardt 2013-07-17 04:22:41 MDT
Created attachment 344 [details]
/tmp/slurmgit_JobCompLoc.log
Comment 3 Danny Auble 2013-07-17 04:37:51 MDT
I'll check this out later.  As mentioned before, I am unaware of anyone using this plugin in production, but will fix the bug :).  You will probably find many more issues as the code hasn't been really touched in many years.

Your site has been in the drop down for a while, I just changed it now.
Comment 4 Ulf Markwardt 2013-07-17 04:43:51 MDT
Of course, you are free to say you don't support filetxt any more. But until then I feel it provides a good testing environment.

Thank you
Ulf
Comment 5 Danny Auble 2013-07-17 04:49:36 MDT
Thanks for the option, I agree it is slightly simpler to setup than the database.  But perhaps we should consider taking it away as it typically doesn't represent a real production system.  We are deprecating the postgres plugin in the next version as well for similar reasons.
Comment 6 Ulf Markwardt 2013-07-17 20:04:23 MDT
Hm... my confidence in the correctness of sacct dropped a little, when it gave these outputs with unknown origin. 
For this reason: Please call the plugin deprecated (officially) or support it. 

To relax the urgency: I will not need the fix for the next 3 weeks.

Thank you
Ulf
Comment 7 Danny Auble 2013-07-18 06:30:57 MDT
I can understand.  I am proposing we just deprecate the plugin.  I would change your statement to "my confidence in the correctness of sacct with the filetxt plugin dropped a little..." :).  I would be surprised if these issues were happening with a regular slurmdbd/mysql setup.
Comment 8 Danny Auble 2013-08-07 08:24:19 MDT
Ulf could you attach the filetxt file for this (/tmp/slurmgit_AccountingStorageLoc)?  I should of asked for it before, but having that will give me the ability to reproduce.  I really only need the lines one of the jobs in question like jobid 500 for instance.
Comment 9 Danny Auble 2013-08-15 05:35:38 MDT
According to your slurm.conf the file is

AccountingStorageLoc=/tmp/slurmgit_AccountingStorageLoc

but the file you sent did display the situation.  I was able to reproduce and fix the problem it is in commit 9eba4384fe0fad228fd570207f63e18d043880fc and will be in 2.6.1.  Let me know if you have any more issues.
Comment 10 Ulf Markwardt 2013-08-16 01:51:31 MDT
Thank you!