Hello, I have, what I believe is, a simple question. I want to get the number of non-duplicate jobs for an account in a given month with sacct. Presently, I use something like: ``` sacct -nPXDT -S $STARTDATE -E $ENDDATE -M $CLUSTER -A $ACCOUNT ``` This seems to be straight-forward and reliable. And, as I understand it, would match what sreport does on the back-end when calculating usage for a given account in a given time frame. If I were to then try to use sreport to get that same information (i.e. number of jobs for a given account in a given time frame), would the following be expected to work and deliver analogous results: ``` sreport -nP job SizesByAccount -M $CLUSTER Start=$STARTDATE End=$ENDDATE PrintJobCount accounts=$ACCOUNT ``` Regards, David
David, First of all sacct -D literally means include the duplicates. Specifically the duplicate jobids. https://slurm.schedmd.com/sacct.html#OPT_duplicates In the case a job is requeued and started again, you will have two records with the same jobid both of which used resources, though the first didn't run to completion. This is the most common way to get duplicate job ids The sreport script skips all jobs that don't have any elapsed time. It also includes duplicate jobs. sacct -nPXDT -S $STARTDATE -E $ENDDATE -M $CLUSTER -A $ACCOUNT --format=jobid,elapsed | grep -x 00:00:00 -Scott
(In reply to Scott Hilton from comment #1) > In the case a job is requeued and started again, you will have two records > with the same jobid both of which used resources, though the first didn't > run to completion. This is the most common way to get duplicate job ids This is what we (Umich) want, in the case of billing (which is something our site has to do), since a job may use resources, get re-queued, and earn a charge. > The sreport script skips all jobs that don't have any elapsed time. It also > includes duplicate jobs. > > sacct -nPXDT -S $STARTDATE -E $ENDDATE -M $CLUSTER -A $ACCOUNT > --format=jobid,elapsed | grep -x 00:00:00 At first I was confused on what you are getting at. I'm only one cup of coffee into the day, so let's see if I get this right. What I *think* you're getting at is that sreport grabs the duplicate jobs by default, whereas sacct has to be _told_ to do that (with the -D flag). And sreport also omits jobs with 0 ElapsedRaw by default, and sacct does _not_. Given these two statements, it is not reasonable to expect analogous output/results from the two commands I initially sent. Would that be true? David
David, I see that my last paragraph was rather confusing. Yes, the two commands you sent will not match. However, the command I sent at the end of the last message should match sreport. Except I wrote grep -x instead of grep -v which is what I meant to write. Also grep -c will give the line count, which is what you are looking for anyway. >sacct -nPXDT -S $STARTDATE -E $ENDDATE -M $CLUSTER -A $ACCOUNT --format=jobid,elapsed | grep -cv 00:00:00 -Scott
David, Do you have anymore questions on this ticket? -Scott
(In reply to Scott Hilton from comment #4) > David, > > Do you have anymore questions on this ticket? > > -Scott Scott, I do not. I apologize. I thought I closed it yesterday as RESOLVED INFO GIVEN. Thanks for checking. David
Closing Ticket