Hi, A question regarding sreport. I am trying to make sense of the following data: sreport --cluster=victini cluster Utilization -t minutes -p -n start=2021-01-01T00:00:00 end=2021-12-31T23:59:59 -T cpu,gres/gpu victini|cpu|1217580421|0|0|54169826|544723353|1816473600| The right number is 365 * 24 * 60 * 96 machines * 36 cores. However, when looking at the utilisation as given by AccountutilizationByUser, I see (in tree format): [root@masterdb01 ~]# sreport --cluster=victini cluster AccountUtilizationByUser -n start=2021-01-01T00:00:00 end=2021-12-31T23:59:59 -T cpu -t minutes Tree | head -n 1 victini root cpu 468734702 If I understand it correctly, that means that the accumulated number of minutes spent by users in the account hierarchy is 468734702. What I fail to grok is how this is related to the 1.2 billion minutes used over the year. Does the utilisation number include the time where jobs requested cores but not actually used them and is this same "usage" ignored in the AccountUtilisationByUser? Kind regards, -- Andy
Hi. I'm checking into this and will get back to you after some investigation.
Hi Chad, Is there anything you found that I am missing? Thanks, -- Andy
(In reply to hpc-admin from comment #2) > Is there anything you found that I am missing? Hi. Nothing to point out quite yet--am working on this again today so will report back more later.
I'm still looking at this but here's a first pass on some answers. Putting headings on your values for readability, these are your two sreport queries: R1: >sreport --cluster=victini cluster Utilization -t minutes -p -n start=2021-01-01T00:00:00 end=2021-12-31T23:59:59 -T cpu,gres/gpu Translates to this: >Cluster Utilization 2021-01-01T00:00:00 - 2021-12-31T23:59:59 >Usage reported in TRES Minutes >-------------------------------------------------------------------------------- > Cluster TRES Name Allocated Down PLND Down Idle Planned Reported >--------- -------------- ---------- -------- --------- --------- --------- ---------- > victini cpu 1217580421 0 0 541698260 544723353 1816473600 R2: >sreport --cluster=victini cluster AccountUtilizationByUser -n start=2021-01-01T00:00:00 end=2021-12-31T23:59:59 -T cpu -t minutes Tree | head -n 1 Translates to this: >-------------------------------------------------------------------------------- >Cluster/Account/User Utilization 2021-01-01T00:00:00 - 2021-12-31T23:59:59 (31536000 secs) >Usage reported in TRES Minutes >-------------------------------------------------------------------------------- > Cluster Account Login Proper Name TRES Name Used >--------- -------------------- --------- --------------- -------------- --------- > victini root cpu 468734702 These are the rephrased and reordered questions I believe you are asking (using the queries above with their column heading names): >Q1: Does R1:Allocated include "the time where jobs requested cores but not actually used them"? >Q2: Is it a correct assumption that R2:Used is "the accumulated number of minutes spent by users in the account hierarchy"? >Q3: How is R1:Allocated related to R2:Used? >Q4: Is "the time where jobs requested cores but not actually used them" counted in R2:Used? Answers so far: Q1: No. It's in R1:Planned. The sreport doc says about "Planned": >Time that a node spent idle with eligible jobs in the >queue that were unable to start due to time or size constraints. Q2: Yes. Q3: They *should* be in sync but not sure why they are not and need to check into this some more. The doc says about "Allocated": >Time that nodes were in use with active jobs or an active >reservation. This does not include reservations created >with the MAINT or IGNORE_JOBS flags. Q4: I don't think so but need to confirm. If you could rerun those two reports, remove the -n, -p and "| head -n 1" and supply the output then I can see a more complete version of them and make sure I'm getting the values/columns right and not missing something else. Also, if you don't agree with my rephrasing of your questions, please feel free to offer corrections. :)
(In reply to Chad Vizino from comment #6) > >Q1: Does R1:Allocated include "the time where jobs requested cores but not actually used them"? > Answers so far: > > Q1: No. It's in R1:Planned. > > The sreport doc says about "Planned": > > >Time that a node spent idle with eligible jobs in the > >queue that were unable to start due to time or size constraints. A clarification: The "Planned" column is actually named "Reserved" in 20.11. In 21.08 "Reserved" became "Planned"--I wasn't considering your version when trying to do the column translation and used the 21.08 name by mistake.
Hi Chad, Could the difference be explained by using MAX_TRES and a TRESBillingWeights that takes the requested memory of a node into account for the billing? As I understand, the billing weights affect fairshare only, right? -- Andy
(In reply to hpc-admin from comment #8) > Could the difference be explained by using MAX_TRES and a TRESBillingWeights > that takes the requested memory of a node into account for the billing? As I > understand, the billing weights affect fairshare only, right? Right--billing weights affect fairshare. From https://slurm.schedmd.com/tres.html: >NOTE: TRESBillingWeights is only used when calculating fairshare ...
(In reply to Chad Vizino from comment #6) > >Q3: How is R1:Allocated related to R2:Used? > >Q4: Is "the time where jobs requested cores but not actually used them" counted in R2:Used? > Answers so far: > > Q3: They *should* be in sync but not sure why they are not and need to check > into this some more. > > The doc says about "Allocated": > > >Time that nodes were in use with active jobs or an active > >reservation. This does not include reservations created > >with the MAINT or IGNORE_JOBS flags. > Q4: I don't think so but need to confirm. > > If you could rerun those two reports, remove the -n, -p and "| head -n 1" > and supply the output then I can see a more complete version of them and > make sure I'm getting the values/columns right and not missing something > else. Still checking on these two but if you can rerun those reports, that would help. Thanks.
(In reply to Chad Vizino from comment #10) > (In reply to Chad Vizino from comment #6) > > >Q3: How is R1:Allocated related to R2:Used? > > >Q4: Is "the time where jobs requested cores but not actually used them" counted in R2:Used? > > Answers so far: > > > > Q3: They *should* be in sync but not sure why they are not and need to check > > into this some more. > > > > The doc says about "Allocated": > > > > >Time that nodes were in use with active jobs or an active > > >reservation. This does not include reservations created > > >with the MAINT or IGNORE_JOBS flags. > > Q4: I don't think so but need to confirm. > > > > If you could rerun those two reports, remove the -n, -p and "| head -n 1" > > and supply the output then I can see a more complete version of them and > > make sure I'm getting the values/columns right and not missing something > > else. > Still checking on these two but if you can rerun those reports, that would > help. Thanks. Checking back in on this: Would it be possible to send these reports? Also, breaking the year in into 12 individual month-long reports to see how things line up will help. That way, if we need to dive into a month, there will be less data to go through. And a couple more things: Are you using preemption? Could you send "sreport user topusage" (over the same period as a previous report) so we can see if it lines up with the other reports?
(In reply to Chad Vizino from comment #11) > (In reply to Chad Vizino from comment #10) > > (In reply to Chad Vizino from comment #6) > > > >Q3: How is R1:Allocated related to R2:Used? > > > >Q4: Is "the time where jobs requested cores but not actually used them" counted in R2:Used? > > > Answers so far: > > > > > > Q3: They *should* be in sync but not sure why they are not and need to check > > > into this some more. > > > > > > The doc says about "Allocated": > > > > > > >Time that nodes were in use with active jobs or an active > > > >reservation. This does not include reservations created > > > >with the MAINT or IGNORE_JOBS flags. > > > Q4: I don't think so but need to confirm. > > > > > > If you could rerun those two reports, remove the -n, -p and "| head -n 1" > > > and supply the output then I can see a more complete version of them and > > > make sure I'm getting the values/columns right and not missing something > > > else. > > Still checking on these two but if you can rerun those reports, that would > > help. Thanks. > > Checking back in on this: Would it be possible to send these reports? Also, > breaking the year in into 12 individual month-long reports to see how things > line up will help. That way, if we need to dive into a month, there will be > less data to go through. > > And a couple more things: > > Are you using preemption? > Could you send "sreport user topusage" (over the same period as a previous > report) so we can see if it lines up with the other reports? Hi. Would you like to continue with this?
Hi. I'm going to go ahead and close this for now. If you'd like to pursue it, feel free to reopen.