Created attachment 342 [details] slurm.conf Dear Slurm support, with this morning's checkout I have run the next large scenario. With this I detected mysetrious discrepancies between the JobCompLoc file and the output of sacct. For example: taurusi1035 /tmp grep "JobId=4623" /tmp/slurmgit_JobCompLoc.log JobId=4623 UserId=mark(19423) GroupId=swtest(50147) Name=git_5 JobState=COMPLETED Partition=all TimeLimit=10 StartTime=2013-07-17T17:40:41 EndTime=2013-07-17T17:43:04 NodeList=taurusi1151 NodeCnt=1 ProcCnt=1 WorkDir=/scratch/mark/44 and /home/mark/slurmgit/bin/sacct -X --format "JobName,JobID,Submit,Start, End,State" -S 2013-07-17T08:00:00 |grep " 4623" git_5 4632 2013-07-17T17:39:00 2013-07-17T17:38:19 2013-07-17T17:40:41 COMPLETED (The clock difference - tested with "clush -bw taurusi[3001-3180],taurusi[1001-1270] date" - is max a second.) --- My major problem with the sacct output is that the submit time is AFTER the start time. Please fix this bug. Thank you Ulf --- After how many bug reports will our site be listed in the drop down field :-)
Created attachment 343 [details] /home/mark/slurmgit/bin/sacct -X --format "JobName,JobID,Submit,Start, End,State" -S 2013-07-17T08:00:00 --name git_5 > git_5.sacct
Created attachment 344 [details] /tmp/slurmgit_JobCompLoc.log
I'll check this out later. As mentioned before, I am unaware of anyone using this plugin in production, but will fix the bug :). You will probably find many more issues as the code hasn't been really touched in many years. Your site has been in the drop down for a while, I just changed it now.
Of course, you are free to say you don't support filetxt any more. But until then I feel it provides a good testing environment. Thank you Ulf
Thanks for the option, I agree it is slightly simpler to setup than the database. But perhaps we should consider taking it away as it typically doesn't represent a real production system. We are deprecating the postgres plugin in the next version as well for similar reasons.
Hm... my confidence in the correctness of sacct dropped a little, when it gave these outputs with unknown origin. For this reason: Please call the plugin deprecated (officially) or support it. To relax the urgency: I will not need the fix for the next 3 weeks. Thank you Ulf
I can understand. I am proposing we just deprecate the plugin. I would change your statement to "my confidence in the correctness of sacct with the filetxt plugin dropped a little..." :). I would be surprised if these issues were happening with a regular slurmdbd/mysql setup.
Ulf could you attach the filetxt file for this (/tmp/slurmgit_AccountingStorageLoc)? I should of asked for it before, but having that will give me the ability to reproduce. I really only need the lines one of the jobs in question like jobid 500 for instance.
According to your slurm.conf the file is AccountingStorageLoc=/tmp/slurmgit_AccountingStorageLoc but the file you sent did display the situation. I was able to reproduce and fix the problem it is in commit 9eba4384fe0fad228fd570207f63e18d043880fc and will be in 2.6.1. Let me know if you have any more issues.
Thank you!