Created attachment 23411 [details]
all transactions for account
Used: sacctmgr -p show transactions accounts=sglotzer3 withassoc
Created attachment 23412 [details]
all jobs for month for account
All the jobs for the given month in which the violation occurred. Used:
```
sacct -PXDT -A sglotzer3 -S 2022-01-01T00:00:00 -E 2022-02-01T00:00:00 -M greatlakes --format=DBIndex,JobId,Account,QOS,User,Partition,Submit,Start,End,AllocTRES,ElapsedRaw
```
Hi David, Thanks for the detailed explanation. Could you also attach the output of the following commands: $ scontrol show assoc_mgr $ sshare -pA sglotzer3 --format=Account,User,GrpTRESMins,GrpTRESRaw Also, is it possible that sglotzer3 of some of their users has been granted into some reservation on the period that seems that they exceeded the limit? If you are not certain, a way to check it is first list the reservation since that period with a command like: $ sacctmgr show reservation Start=2022-01-01 Format=Id,Name,Start,End,UnusedWall,Associations And see if the Associations column contains the association id of that Account or a parent of it. Thanks, Albert (In reply to Albert Gil from comment #4) Hi, Albert! > Could you also attach the output of the following commands: > > $ scontrol show assoc_mgr The output from this is attached. > $ sshare -pA sglotzer3 --format=Account,User,GrpTRESMins,GrpTRESRaw ``` [root@gl-build reqBySchedmd]# date; sshare -pA sglotzer3 --format=Account,User,GrpTRESMins,GrpTRESRaw Fri Feb 11 13:32:24 EST 2022 Account|User|GrpTRESMins|GrpTRESRaw| sglotzer3||billing=7253735926|cpu=3235077,mem=4288316490,energy=0,node=1191309,billing=5857524733,fs/disk=0,vmem=0,pages=0,gres/gpu=360570,license/aa_r@slurmdb=0,license/aa_r_hpc@slurmdb=0,license/abaqus@slurmdb=0,license/ampl@slurmdb=0,license/ansys@slurmdb=0,license/cfd_solv_ser@slurmdb=0,license/coe-lsdyna@slurmdb=0,license/comsol@slurmdb=0,license/fdtd_dengh@slurmdb=0,license/fdtd_kotov@slurmdb=0,license/gurobi@slurmdb=0,license/helios@slurmdb=0,license/jwhu-lsdyna@slurmdb=0,license/sas@slurmdb=0,license/sentaurus@slurmdb=0,license/stata-mp@slurmdb=0,license/stata@slurmdb=0| ``` > Also, is it possible that sglotzer3 of some of their users has been granted > into some reservation on the period that seems that they exceeded the limit? > > If you are not certain, a way to check it is first list the reservation > since that period with a command like: > > $ sacctmgr show reservation Start=2022-01-01 > Format=Id,Name,Start,End,UnusedWall,Associations > > And see if the Associations column contains the association id of that > Account or a parent of it. Here's all the reservations for the cluster using the command you suggested: ``` [root@gl-build reqBySchedmd]# sacctmgr -P show reservation Start=2022-01-01 Format=Id,Name,Start,End,UnusedWall,Associations cluster=greatlakes ID|Name|TimeStart|TimeEnd|UnusedWall|Assocs 140|2021_Summer_Maintenance|2021-08-09T17:36:53|2022-08-09T06:00:00|104455387.000000|41,39,36,31,12700,32,33,38,43 161|spgpu_foe|2021-12-03T08:36:21|2022-01-10T19:01:50|617983.000000|27796,27787,27750,28719,26,27770,27813,27807 154|spgpu_node_testing|2021-12-21T10:32:46|2022-01-01T00:00:00|912434.000000|26 165|arc_testing|2022-01-05T08:42:42|2022-01-06T14:06:13|105802.725000|26 166|arc-firmware-maint|2022-01-06T08:00:00|2022-01-06T13:23:01|19381.000000|2 167|arc_testing|2022-01-06T14:08:26|2022-01-06T14:13:39|312.781250|26 156|2022_Winter_Maintenance|2022-01-10T05:00:00|2022-01-10T13:29:24|30562.971016|32,43,38,31,33,12700,41,36,39,42 156|2022_Winter_Maintenance|2022-01-10T13:29:24|2022-01-10T19:01:50|19931.000000|32,43,38,31,33,12700,41,36,39,42,15110 156|2022_Winter_Maintenance|2022-01-10T19:01:50|2022-01-11T07:16:29|44079.000000|32,43,38,31,33,12700,41,36,39,42,15110 161|spgpu_foe|2022-01-10T19:01:50|2022-12-01T11:39:15|0.000000|27796,27787,27750,28719,26,27770,27813,27807 156|2022_Winter_Maintenance|2022-01-11T07:16:29|2022-01-11T12:10:47|17649.000000|32,43,38,31,33,12700,41,36,39,42,15110 168|fw_fix|2022-01-11T11:32:08|2022-01-13T17:16:21|193453.000000|26 156|2022_Winter_Maintenance|2022-01-11T12:10:47|2022-01-11T12:14:07|200.000000|32,43,38,31,33,12700,41,36,39,42,15110 156|2022_Winter_Maintenance|2022-01-11T12:14:07|2022-01-11T12:15:52|105.000000|32,43,38,31,33,12700,41,36,39,42,15110 156|2022_Winter_Maintenance|2022-01-11T12:15:52|2022-01-11T12:17:25|93.000000|32,43,38,31,33,12700,41,36,39,42,15110 156|2022_Winter_Maintenance|2022-01-11T12:17:25|2022-01-11T12:45:47|1676.795812|32,43,38,31,33,12700,41,36,39,42,15110 156|2022_Winter_Maintenance|2022-01-11T12:45:47|2022-01-11T15:33:01|8719.000000|32,43,38,31,33,12700,41,36,39,42,15110 169|dek-SPGPU-perftest|2022-01-11T13:12:12|2022-01-11T15:32:15|8403.000000|26 169|dek-SPGPU-perftest|2022-01-11T15:32:15|2023-01-11T13:12:12|16445764.000000|26 ``` Checking for the ID of sglotzer3: ``` [root@gl-build reqBySchedmd]# grep sglotzer3 all_assocs UserName=pepak(114197305) DefAccount=sglotzer3 DefWckey= AdminLevel=None UserName=schwendp(114157756) DefAccount=sglotzer3 DefWckey= AdminLevel=None ClusterName=greatlakes Account=sglotzer3 UserName= Partition= Priority=0 ID=8687 ClusterName=greatlakes Account=sglotzer3 UserName=alacour(99511908) Partition= Priority=0 ID=8700 ClusterName=greatlakes Account=sglotzer3 UserName=alaink(114246457) Partition= Priority=0 ID=19987 ClusterName=greatlakes Account=sglotzer3 UserName=ispivack(114277326) Partition= Priority=0 ID=31820 ClusterName=greatlakes Account=sglotzer3 UserName=jproc(99459669) Partition= Priority=0 ID=8699 ClusterName=greatlakes Account=sglotzer3 UserName=pepak(114197305) Partition= Priority=0 ID=8698 ClusterName=greatlakes Account=sglotzer3 UserName=schwendp(114157756) Partition= Priority=0 ID=20089 ClusterName=greatlakes Account=sglotzer3 UserName=thiv(114146184) Partition= Priority=0 ID=8695 ClusterName=greatlakes Account=sglotzer3 UserName=ttdwyer(114196558) Partition= Priority=0 ID=8697 ClusterName=greatlakes Account=sglotzer3 UserName=yuanzhou(114160342) Partition= Priority=0 ID=8696 sglotzer3 ``` The ID of 8687 doesn't show up on any of those reservations: ``` [root@gl-build reqBySchedmd]# date; sacctmgr -P show reservation Start=2022-01-01 Format=Id,Name,Start,End,UnusedWall,Associations cluster=greatlakes | grep 8687 Fri Feb 11 13:34:20 EST 2022 [root@gl-build reqBySchedmd]# echo $? 1 ``` Thanks! David Created attachment 23436 [details]
all greatlakes associations
Hi David, I don't know exactly what happen, but from the information that you provided it seems that the actual GrpTRESMins limit for the sglotzer3 account is actually 7253735926 and not 5679214975 as you think (and as like the transactio indicate). At least, that's what scontrol is reporting. To verify the value in the DB, could you attach the output of: $ sacctmgr show association where account=sglotzer3 user="" format=Account,GrpTRESMins If it's 7253735926, then it seems that somehow that value was changed. If that's the case, I would suggest to change the value. I don't know what happen, but note that 253735926 - 5679214975 = 1574520951, so maybe you run your upgrade limit twice or something? It it's 5679214975, then we have some sync problem between slurmdbd and slurmctld. Please restart slurmctld to see if they sync again, and attach both logs to see what could happen. Regards, Albert (In reply to Albert Gil from comment #8) Hi, Albert, > I don't know exactly what happen, but from the information that you provided > it seems that the actual GrpTRESMins limit for the sglotzer3 account is > actually 7253735926 and not 5679214975 as you think (and as like the > transactio indicate). We are investigating the issue for the month of 2022-01 (from 2022-01-01 to 2022-02-02). The *current* GrpTRESMins=billing data _is_ 7253735926. The concern is that, for the month of January 2022, the account had a given limit in place that was violated. As context (and perhaps you might recall) we increment our account limits on a monthly basis, wherein we take the previous usage for the account and add their requested limit/cap to it to maintain an upper bound/threshold to run up against and not run over. > At least, that's what scontrol is reporting. > To verify the value in the DB, could you attach the output of: > > $ sacctmgr show association where account=sglotzer3 user="" > format=Account,GrpTRESMins ``` [root@glctld ~]# date; sacctmgr -p show association where account=sglotzer3 user="" format=Account,GrpTRESMins Mon Feb 14 11:59:29 EST 2022 Account|GrpTRESMins| sglotzer3|billing=7253735926| ``` > If it's 7253735926, then it seems that somehow that value was changed. > If that's the case, I would suggest to change the value. > I don't know what happen, but note that 253735926 - 5679214975 = 1574520951, > so maybe you run your upgrade limit twice or something? > > It it's 5679214975, then we have some sync problem between slurmdbd and > slurmctld. > Please restart slurmctld to see if they sync again, and attach both logs to > see what could happen. I think given the fact that we are referring to a specific month in which they ran over, the potential of a sync issue might be less likely? > Regards, > Albert Hi David, > As context (and perhaps you might recall) we increment our account limits on > a monthly basis, wherein we take the previous usage for the account and add > their requested limit/cap to it to maintain an upper bound/threshold to run > up against and not run over. Ok, yes I recall discussing this. I guess that I already suggested to use PriorityDecayHalfLife or PriorityUsageResetPeriod, but it was discarded for some reason that I cannot recall now. > We are investigating the issue for the month of 2022-01 (from 2022-01-01 to > 2022-02-02). The *current* GrpTRESMins=billing data _is_ 7253735926. The > concern is that, for the month of January 2022, the account had a given > limit in place that was violated. Ok, then you are not exactly "reproducing the issue" anymore because you already increased the limit. So, now we cannot fully confirm if the account was actually violating the limit, or if there is some difference between the "usage" accounted with slurmdbd/sreport and the usage tracked in real-time by slurmctld/scontrol. But in my experience, the later option is way more probable. Note that, if we speak about differences between sreport and scontrol usage values, although they are really tied values, they are not exactly the same. Meaning that while the sreport information is actual *time of a resource* allocated (or in a reservation) for an account, the values from scontrol are "usage units" used by the scheduler to enforce limits, fairshare, etc.. and for example there are options like the UsageFactor of a QOS that can make those *usage values* differ from the *actual time* being allocated (or reserved). We could also have other sources of differences, like runaways jobs. > > If it's 7253735926, then it seems that somehow that value was changed. > > If that's the case, I would suggest to change the value. > > I don't know what happen, but note that 253735926 - 5679214975 = 1574520951, > > so maybe you run your upgrade limit twice or something? > > > > It it's 5679214975, then we have some sync problem between slurmdbd and > > slurmctld. > > Please restart slurmctld to see if they sync again, and attach both logs to > > see what could happen. > > I think given the fact that we are referring to a specific month in which > they ran over, the potential of a sync issue might be less likely? Yes, now I know that the value was changed on Feb 1st. At this point, to investigate it further, what we could do is to try to see if there is any significant differences between the usage accounted (hourly) by slurmdbd/sreport, and the usage tracked in real-time by slurmcltd/scontrol. They will never 100% match because of their real-time VS hourly nature, and because of what I mentioned above, but they are related values and difference should make sense. Do you want to follow this path? The alternative is wait to detect another possible violation and then check the slurmctld/scontrol/sshare values of the limit and usage. I would recommend this approach. Regards, Albert (In reply to Albert Gil from comment #10) Hi, Albert, > Ok, then you are not exactly "reproducing the issue" anymore because you > already increased the limit. > So, now we cannot fully confirm if the account was actually violating the > limit, or if there is some difference between the "usage" accounted with > slurmdbd/sreport and the usage tracked in real-time by slurmctld/scontrol. > But in my experience, the later option is way more probable. While we cannot presently reproduce the issue, I think the fact that the issue occurred is still valid. Historically, we have seen `AssocGrpBillingMinutes` for jobs on accounts that would exceed their limit if run. And I have shown here, to my mind, how the account had a given limit in place, and towards the middle of the month got to the point of exceeding it (and should have held jobs with `AssocGrpBillingMinutes` at that point) but continued accepting jobs anyways. > The alternative is wait to detect another possible violation and then check > the slurmctld/scontrol/sshare values of the limit and usage. > I would recommend this approach. I think this is probably the best approach, too. We will increase our logrotate policies for slurmctld and slurmsched logs to start. Are there other logs you'd recommend? You had mentioned "slurmctld/scontrol/sshare" as points of interest. - Is it just a matter of using scontrol/sshare to capture data to a file once we know the issue has occurred (potentially within a few hours of our scripts running and notifying us of a limit violation)? - Or is it better to capture as much data as possible as *soon* as the limit is violated? The former is a more manual approach and basically how we got data to you for this bug. The latter would require more scripting and things on our end, I think, in order to capture things effectively and would require more time. David Hi David, > While we cannot presently reproduce the issue, I think the fact that the > issue occurred is still valid. Sure. My point is that *maybe* the issue is not exactly that a limit is being ignored (I'm not discarding it neither), but maybe the actual problem is that sreport is reporting more usage than the actual one tracked by slurmctld for some reason. Right now this is my bet, but I'm not certain. Note that the fact that sreport may be reporting higher values may also be a valid a bug, I'm not discarding it neither. > Historically, we have seen > `AssocGrpBillingMinutes` for jobs on accounts that would exceed their limit > if run. And I have shown here, to my mind, how the account had a given limit > in place, and towards the middle of the month got to the point of exceeding > it (and should have held jobs with `AssocGrpBillingMinutes` at that point) > but continued accepting jobs anyways. I would like to review this. Do you recall any specific bug number? > I think this is probably the best approach, too. We will increase our > logrotate policies for slurmctld and slurmsched logs to start. Are there > other logs you'd recommend? The slurmctld and slurmdbd are the key ones on this case. We may need to increase SlurmctldDebug up to debug2, but you may do that only when you hit the problem, for ~24h if possible. > You had mentioned "slurmctld/scontrol/sshare" as points of interest. > > - Is it just a matter of using scontrol/sshare to capture data to a file > once we know the issue has occurred (potentially within a few hours of our > scripts running and notifying us of a limit violation)? > - Or is it better to capture as much data as possible as *soon* as the limit > is violated? > > The former is a more manual approach and basically how we got data to you > for this bug. The latter would require more scripting and things on our end, > I think, in order to capture things effectively and would require more time. The former and more manual approach is just good enough. The only think is what I mentioned before, enable SlurmctldDebug=debug2 and attach the logs with it. Along with the output of these commands replacing sglotzer3 for the account hitting the limit, and the right date: $ scontrol show assoc_mgr $ sshare -pA sglotzer3 --format=Account,User,GrpTRESMins,GrpTRESRaw $ sacctmgr show reservation Start=2022-01-01 Format=Id,Name,Start,End,UnusedWall,Associations Also, you could you attach now and also when the issue happen again the output of this command: $ sacctmgr show -p association tree format=id,account,GrpTRESMins,parentid,parentname Thanks, Albert (In reply to Albert Gil from comment #12) Hi, Albert, > Sure. > My point is that *maybe* the issue is not exactly that a limit is being > ignored (I'm not discarding it neither), but maybe the actual problem is > that sreport is reporting more usage than the actual one tracked by > slurmctld for some reason. > Right now this is my bet, but I'm not certain. > Note that the fact that sreport may be reporting higher values may also be a > valid a bug, I'm not discarding it neither. Perfect. I just wanted to make sure I was doing a clear job of explaining our concern and that I didn't inadvertently miss something. > > Historically, we have seen > > `AssocGrpBillingMinutes` for jobs on accounts that would exceed their limit > > if run. And I have shown here, to my mind, how the account had a given limit > > in place, and towards the middle of the month got to the point of exceeding > > it (and should have held jobs with `AssocGrpBillingMinutes` at that point) > > but continued accepting jobs anyways. > > I would like to review this. > Do you recall any specific bug number? We don't have a prior bug for this that I know of. We have been under the impression that when a job hits `AssocGrpBillingMinutes` it means it's limit would be exceeded if the given job were to run. At present, for example, we have jobs held for that very reason: ``` [root@gl-build ~]# date; squeue --state=pending|grep AssocGrpBillingMinutes | wc -l Tue Feb 15 12:06:58 EST 2022 284 ``` > The former and more manual approach is just good enough. > The only think is what I mentioned before, enable SlurmctldDebug=debug2 and > attach the logs with it. > Along with the output of these commands replacing sglotzer3 for the account > hitting the limit, and the right date: > > $ scontrol show assoc_mgr > $ sshare -pA sglotzer3 --format=Account,User,GrpTRESMins,GrpTRESRaw > $ sacctmgr show reservation Start=2022-01-01 > Format=Id,Name,Start,End,UnusedWall,Associations Noted. Thanks! > Also, you could you attach now and also when the issue happen again the > output of this command: > > $ sacctmgr show -p association tree > format=id,account,GrpTRESMins,parentid,parentname Output attached David Created attachment 23486 [details]
sacctmgr_assoc_tree
Hi David, > > > Historically, we have seen > > > `AssocGrpBillingMinutes` for jobs on accounts that would exceed their limit > > > if run. And I have shown here, to my mind, how the account had a given limit > > > in place, and towards the middle of the month got to the point of exceeding > > > it (and should have held jobs with `AssocGrpBillingMinutes` at that point) > > > but continued accepting jobs anyways. > > > > I would like to review this. > > Do you recall any specific bug number? > > We don't have a prior bug for this that I know of. We have been under the > impression that when a job hits `AssocGrpBillingMinutes` it means it's limit > would be exceeded if the given job were to run. At present, for example, we > have jobs held for that very reason: > > ``` > [root@gl-build ~]# date; squeue --state=pending|grep AssocGrpBillingMinutes > | wc -l > Tue Feb 15 12:06:58 EST 2022 > 284 > ``` Yes, if the usage of a user or an account reach their AssocGrpBillingMinutes limit but they are still able to run a new job, that would mean that the limit is not enforced/honored, and this shouldn't happen. My point was trying to clarify that *maybe* the way that we/you compute the actual usage that the limits is compared against is not exactly correct/equivalent to the way that slurmctld does it, so the usage that slurmctld is tracking may be lower than the one that your scripts are computing. In that case, the actual problem wouldn't be the limit not being enforced, but how the usage value is computed (by your script and by slurmctld). > > Also, you could you attach now and also when the issue happen again the > > output of this command: > > > > $ sacctmgr show -p association tree > > format=id,account,GrpTRESMins,parentid,parentname > > Output attached Ok, I've verified that neither sglotzer3(8687) or or their their parent tree (sglotzer_root(8681) and root(1)) were in any reservation, and none of it child tree neither. Thanks, Albert Hi David, I hope you are doing well. Have you been able to reproduce the issue and get the debug information commented from comment 10? Regards, Albert (In reply to Albert Gil from comment #16) > Hi David, > > I hope you are doing well. > Have you been able to reproduce the issue and get the debug information > commented from comment 10? > > Regards, > Albert Hi Albert, I am well! I hope the same for you. We are currently writing a script that will monitor the situation and enable debug when conditions are met. Our billing stuff runs on the first of the month and the middle of the month (as we noticed some other issues noted in another bug). I hope to have something implemented in the next few days to actively look at whether or not accounts are surpassing their limits. David Hi David,
> We are currently writing a script that
> will monitor the situation and enable debug when conditions are met. Our
> billing stuff runs on the first of the month and the middle of the month (as
> we noticed some other issues noted in another bug). I hope to have something
> implemented in the next few days to actively look at whether or not accounts
> are surpassing their limits.
I assume that you still don't have any news, right?
Regards,
Albert
(In reply to Albert Gil from comment #18) Hi Albert! > I assume that you still don't have any news, right? Thanks for checking. That's correct. The script is in place and is running nightly to check if accounts have gone past their limit and will enable debug2 if it is so detected. Regards, David Albert,
While I do not *yet* have any data showing another overage/violation, I do have an example I had mentioned previously around "expected functionality". I.e. when an account is approaching its limit, jobs that would presumably cause its limit to be exceeded are held with `AssocGrpBillingMinutes` so that the limit isn't violated.
For example. I noticed in my script today the following line:
```
zhanc1 2349.46 2500.0
```
The first column is the account, the second column is the "usage as dollars" and the third is their "spending limit". When I saw this I thought "They are close to their limit. I should look at `squeue` and see if anything is held" and lo and behold, there were holds as I'd hoped:
```
[root@gl-build ~]# date; squeue -A zhanc1
Wed Mar 9 14:35:46 EST 2022
JOBID PARTITION NAME USER ACCOUNT ST TIME NODES NODELIST(REASON)
33040275 standard allcom yuchenwu zhanc1 PD 0:00 2 (AssocGrpBillingMinutes)
33040273 standard allcom yuchenwu zhanc1 PD 0:00 2 (AssocGrpBillingMinutes)
33040271 standard allcom yuchenwu zhanc1 R 1-02:28:19 2 gl[3091-3092]
33040270 standard allcom yuchenwu zhanc1 R 1-10:41:41 2 gl[3050,3102]
```
Would you like any additional data?
David
Hi David, > While I do not *yet* have any data showing another overage/violation, I guess that this is "good"... ;-) > I do > have an example I had mentioned previously around "expected functionality". > I.e. when an account is approaching its limit, jobs that would presumably > cause its limit to be exceeded are held with `AssocGrpBillingMinutes` so > that the limit isn't violated. > > For example. I noticed in my script today the following line: > > ``` > zhanc1 2349.46 2500.0 > ``` > > The first column is the account, the second column is the "usage as dollars" > and the third is their "spending limit". When I saw this I thought "They are > close to their limit. I should look at `squeue` and see if anything is held" > and lo and behold, there were holds as I'd hoped: > > ``` > [root@gl-build ~]# date; squeue -A zhanc1 > Wed Mar 9 14:35:46 EST 2022 > JOBID PARTITION NAME USER ACCOUNT ST TIME NODES > NODELIST(REASON) > 33040275 standard allcom yuchenwu zhanc1 PD 0:00 2 > (AssocGrpBillingMinutes) > 33040273 standard allcom yuchenwu zhanc1 PD 0:00 2 > (AssocGrpBillingMinutes) > 33040271 standard allcom yuchenwu zhanc1 R 1-02:28:19 2 > gl[3091-3092] > 33040270 standard allcom yuchenwu zhanc1 R 1-10:41:41 2 > gl[3050,3102] > ``` If I understand correctly, this is the expected behavior also from your side, right? > Would you like any additional data? If the behavior is what you expected, we don't really need anything else. But anyway, if you enable SlurmctldDebug=debug2 and attach the slurmctld logs we should be able to see more details why those 33040273 and 33040275 are hitting that limit. Regards, Albert Hi David!
> > While I do not *yet* have any data showing another overage/violation,
>
> I guess that this is "good"... ;-)
>
> > I do
> > have an example I had mentioned previously around "expected functionality".
> > I.e. when an account is approaching its limit, jobs that would presumably
> > cause its limit to be exceeded are held with `AssocGrpBillingMinutes` so
> > that the limit isn't violated.
>
> If I understand correctly, this is the expected behavior also from your
> side, right?
>
> > Would you like any additional data?
>
> If the behavior is what you expected, we don't really need anything else.
> But anyway, if you enable SlurmctldDebug=debug2 and attach the slurmctld
> logs we should be able to see more details why those 33040273 and 33040275
> are hitting that limit.
I'm just following up on this, any news?
Regards,
Albert
Hi Albert,
There is one thing to report. The command I was using to get usages in my script no longer seems to be working:
```
date; sreport -nP -T billing cluster AccountUtilizationByUser Start=2022-03-01T00:00:00 End=now -M greatlakes format=account,login,used | awk -F "|" '{ if ($2=="") print $1","$3 }'
Wed Mar 16 14:36:48 EDT 2022
sreport: error: Getting response to message type: DBD_GET_ASSOCS
sreport: error: DBD_GET_ASSOCS failure: No error
slurmdb_report_cluster_account_by_user: Problem with get query.
```
I have tried using the aforementioned with `End=Now` after `End=2022-04-01T00:00:00` started exhibiting problems.
This command worked in the past iterations when the script ran, but at some point it started producing this. Any thoughts?
David
Hi David,
> The command I was using to get usages in my
> script no longer seems to be working:
>
> ```
> date; sreport -nP -T billing cluster AccountUtilizationByUser
> Start=2022-03-01T00:00:00 End=now -M greatlakes format=account,login,used |
> awk -F "|" '{ if ($2=="") print $1","$3 }'
> Wed Mar 16 14:36:48 EDT 2022
> sreport: error: Getting response to message type: DBD_GET_ASSOCS
> sreport: error: DBD_GET_ASSOCS failure: No error
> slurmdb_report_cluster_account_by_user: Problem with get query.
> ```
>
> I have tried using the aforementioned with `End=Now` after
> `End=2022-04-01T00:00:00` started exhibiting problems.
>
> This command worked in the past iterations when the script ran, but at some
> point it started producing this. Any thoughts?
It seems that sreport is having problems to communicate with slurmdbd (or slurmdbd with the SQL DB).
Is it happening only with sreport? Not with sacct or sacctmgr?
Could you attach the slurmdbd logs?
Regards,
Albert
(In reply to Albert Gil from comment #25) Hi Albert! > It seems that sreport is having problems to communicate with slurmdbd (or > slurmdbd with the SQL DB). > Is it happening only with sreport? Not with sacct or sacctmgr? > > Could you attach the slurmdbd logs? Logs are attached. I observed the issue again today with sreport, but NOT with sacct or sacctmgr with the same date-based parameters I used in sreport (i.e. start 2022-03-01T00:00:00, end 2022-04-01T00:00:00 -- from the start of the month, to the start of the next month into the future). David Created attachment 23987 [details]
slurmdbd log
Hi David, > > It seems that sreport is having problems to communicate with slurmdbd (or > > slurmdbd with the SQL DB). > > Is it happening only with sreport? Not with sacct or sacctmgr? > > > > Could you attach the slurmdbd logs? > > Logs are attached. I observed the issue again today with sreport, but NOT > with sacct or sacctmgr with the same date-based parameters I used in sreport > (i.e. start 2022-03-01T00:00:00, end 2022-04-01T00:00:00 -- from the start > of the month, to the start of the next month into the future). I've looked your logs and, although I'm not certain why is this happening, I'm willing to investigate it further and I've some comments about them. But I think that it will be best to fork the investigation that we started in comment 24 into another ticket, just to keep this one focused on the original GrpTRESMins issue. Would you mind to create a new ticket about the error with sreport referencing this? Thanks, Albert (In reply to Albert Gil from comment #28) Hi Alber > Would you mind to create a new ticket about the error with sreport > referencing this? bug 13693 has been created David Albert, We do have another example of an account going over its limit. Unfortunately, it's for February 2022 data, so we weren't able to catch it and enable betting debugging at the time. David Hi David,
> We do have another example of an account going over its limit.
> Unfortunately, it's for February 2022 data, so we weren't able to catch it
> and enable betting debugging at the time.
Thanks for letting me know.
I assume that you have the debug enabled and, if a new one happens, we'll get the debug traces, right?
Let's see!
Albert
(In reply to Albert Gil from comment #32) Albert, > I assume that you have the debug enabled and, if a new one happens, we'll > get the debug traces, right? This instance was one that occurred in the past. The script we have monitoring things will *enable* debug upon detection. With the amount of jobs our cluster is running, leaving any form of debug enabled by default slows things down in our experience. David Hi David, If this is ok for you I'm closing this ticket for now as cannotreproduce. But as discussed, please don't hesitate to reopen it adding more debug information if you can. Regards, Albert Hello,
This has happened again and we can replicate it. The account in question has a given GrpTRESMins=billing limit, and it's clearly able violate it - i.e. the job is NOT held for violating the limit.
Before testing a small job, I enabled debug2 so the output mentioned below should have that level of detail for you.
I've attached the following: slurm.conf, the transactions log for the account in question, the sacct output for the month in which the limit was violated, the output of scontrol assoc_mgr, and the output of sshare for the account, along with the slurmctld.log and slurmschedule logs.
Here is the current GrpTRESMins limit for the account:
```
[root@armis2-build ~]# date; sacctmgr show assoc account=robertwo0 -p|head -n2
Wed Nov 30 13:25:06 EST 2022
Cluster|Account|User|Partition|Share|Priority|GrpJobs|GrpTRES|GrpSubmit|GrpWall|GrpTRESMins|MaxJobs|MaxTRES|MaxTRESPerNode|MaxSubmit|MaxWall|MaxTRESMins|QOS|Def QOS|GrpTRESRunMins|
armis2|robertwo0|||1||||||billing=27651391061|||||||interactive,normal|||
```
And here is the all time usage for the account:
```
# allTimeUsage
[root@armis2-build ~]# date; sreport -nP -T billing cluster AccountUtilizationByUser Account=robertwo0 Start=2020-01-06T00:00:00 End=now -M $CLUSTER_NAME format=account,login,used | awk -F "|" '{ if ($2=="") print $3 }'
Wed Nov 30 13:27:07 EST 2022
67923028352
```
David
Created attachment 27964 [details]
robertwo0 sshare
Created attachment 27965 [details]
scontrol assoc_mgr output
Created attachment 27966 [details]
robertwo0 month in which jobs violation occurred
Created attachment 27967 [details]
robertwo0 transactions log from slurm
Created attachment 27968 [details]
armis2ctld slurmsched log
Created attachment 27969 [details]
armis2 slurmctld log
Created attachment 27970 [details]
armis2 slurm.conf
Hi David, > This has happened again and we can replicate it. The account in question has > a given GrpTRESMins=billing limit, and it's clearly able violate it - i.e. > the job is NOT held for violating the limit. > > Before testing a small job, I enabled debug2 so the output mentioned below > should have that level of detail for you. > > I've attached the following: slurm.conf, the transactions log for the > account in question, the sacct output for the month in which the limit was > violated, the output of scontrol assoc_mgr, and the output of sshare for the > account, along with the slurmctld.log and slurmschedule logs. Thanks for the detailed information. It has been very helful to investigate the issue. > Here is the current GrpTRESMins limit for the account: > > ``` > [root@armis2-build ~]# date; sacctmgr show assoc account=robertwo0 -p|head > -n2 > Wed Nov 30 13:25:06 EST 2022 > Cluster|Account|User|Partition|Share|Priority|GrpJobs|GrpTRES|GrpSubmit|GrpWa > ll|GrpTRESMins|MaxJobs|MaxTRES|MaxTRESPerNode|MaxSubmit|MaxWall|MaxTRESMins|Q > OS|Def QOS|GrpTRESRunMins| > armis2|robertwo0|||1||||||billing=27651391061|||||||interactive,normal||| > ``` Yes, the same limit billing=27651391061 is reported also by slurmctld (in sshare and show assoc_mgr). > And here is the all time usage for the account: > > ``` > # allTimeUsage > [root@armis2-build ~]# date; sreport -nP -T billing cluster > AccountUtilizationByUser Account=robertwo0 Start=2020-01-06T00:00:00 End=now > -M $CLUSTER_NAME format=account,login,used | awk -F "|" '{ if ($2=="") print > $3 }' > Wed Nov 30 13:27:07 EST 2022 > 67923028352 > ``` IIUC you are using as "usage" for the limit that 67923028352 value from sreport, right? In comment 15 I mentioned this: > My point was trying to clarify that *maybe* the way that we/you compute the actual usage that > the limits is compared against is not exactly correct/equivalent to the way that slurmctld > does it, so the usage that slurmctld is tracking may be lower than the one that your scripts > are computing. In that case, the actual problem wouldn't be the limit not being enforced, but > how the usage value is computed (by your script and by slurmctld). We recently discussed further those differences on the meaning/computing of "usage" in bug 15375 comment 3. After inspecting your logs I see that slurmctd has the limit correctly set as you expected, but the usage is totally different. In particular, as you can see in sshare (GrpTRESRaw) the internal usage for the robertwo0 account is only 176. This also appears in the "scontrol show assoc_mgr": billing=27651391061(176). Actually, all that usage belongs to a single user in the account: drhey. And from the transactions it seems that it's a new user, so I guess that this is why you've noticed this now. Therefore, the key to understand is why slurmctld thinks that the usage of that account is only 176 (was actually 0 since drhey was added) user and you/sreport think it's 67923028352? The tipical reason are reservations. Does that make sense for you? Regards, Albert (In reply to Albert Gil from comment #44) Hi Albert! > IIUC you are using as "usage" for the limit that 67923028352 value from > sreport, right? Yes, in part. We use the usage number reported from sreport to help us set the limit in GrpTRESMins=billing. We take that number an add the limit that the user wants to maintain on the account. The reason I brought it up here is that the sreport usage number is greater than the GrpTRESMins=billing limit which shouldn't happen based on how we're reading the GrpTRESMins documentation in the man pages. > In comment 15 I mentioned this: > > > My point was trying to clarify that *maybe* the way that we/you compute the actual usage that > > the limits is compared against is not exactly correct/equivalent to the way that slurmctld > > does it, so the usage that slurmctld is tracking may be lower than the one that your scripts > > are computing. In that case, the actual problem wouldn't be the limit not being enforced, but > > how the usage value is computed (by your script and by slurmctld). Interesting. Well, for our part our scripts aren't doing much calculation; at least nothing sophisticated. We get the value from sreport (using the same command I've used here) and then add the upper-bound limit to it that the user would like to maintain. An easy example of this process would be: a user wants to maintain a limit of 10,000,000 per month. In the first month they use 2000. At the start of the next month, in order to maintain that limit of 10,000,000 AND account for the usage we would add 2000 to the 10,000,000 limit making it 10,002,000. > We recently discussed further those differences on the meaning/computing of > "usage" in bug 15375 comment 3. > After inspecting your logs I see that slurmctd has the limit correctly set > as you expected, but the usage is totally different. > In particular, as you can see in sshare (GrpTRESRaw) the internal usage for > the robertwo0 account is only 176. > This also appears in the "scontrol show assoc_mgr": billing=27651391061(176). > Actually, all that usage belongs to a single user in the account: drhey. > And from the transactions it seems that it's a new user, so I guess that > this is why you've noticed this now. > > Therefore, the key to understand is why slurmctld thinks that the usage of > that account is only 176 (was actually 0 since drhey was added) user and > you/sreport think it's 67923028352? > The tipical reason are reservations. > Does that make sense for you? This might be a bit of a red-herring for our discussion. I added myself to the account to see if I could replicate the issue, and I could. That's why I put the output here. The issue of Slurm being able to go over the GrpTRESMins=billing limit was brought to our attention *before* I added my username to the account. I could try with another previously established user, if that might help? How often does slurmctld get usage information for itself to keep things in sync and maintain those limits (if it does so at all)? David Hi David, > Yes, in part. We use the usage number reported from sreport to help us set > the limit in GrpTRESMins=billing. We take that number an add the limit that > the user wants to maintain on the account. > > The reason I brought it up here is that the sreport usage number is greater > than the GrpTRESMins=billing limit which shouldn't happen based on how we're > reading the GrpTRESMins documentation in the man pages. I understand, but actually it could happen is several cases. The reason is that, using "the 4 definitions of usage" described in bug 15375 comment 3, you think that the usage of the account is the value D), but slurmctld is using C). And, for some reason, in your case those values are really-really different. In your case D is 67923028352, but C is almost 0. That is, slurmctld has almost no usage charged into that account while you think that the usage is really big. Therefore, the question is not about a limit not being respected, but why C and D are so different in your case. Do you agree? > Interesting. Well, for our part our scripts aren't doing much calculation; > at least nothing sophisticated. We get the value from sreport (using the > same command I've used here) and then add the upper-bound limit to it that > the user would like to maintain. So, the key is why sreport is reporting such a big usage right? The most typical case are reservations. Could it be your case too? > An easy example of this process would be: a user wants to maintain a limit > of 10,000,000 per month. In the first month they use 2000. At the start of > the next month, in order to maintain that limit of 10,000,000 AND account > for the usage we would add 2000 to the 10,000,000 limit making it 10,002,000. I'm wondering if we could just ignore for a moment the differences of sreport and slurmctd usage, and maybe you can just us the actual slurmctld value? You can gather the current limit and usage with both "sshare" (GrpTRESMins is the limit and GrpTRESRaw is the usage), or with "scontrol show assoc_mgr" (you'll see lines with the format "GrpTRESMins=...,billing=limit(usage)"). The main difference is that, unlike with sreport, you can only query *current* usage. You cannot query the usage that the account did in a past period of time. But, in your case, I think that maybe that's not that bad? > This might be a bit of a red-herring for our discussion. I added myself to > the account to see if I could replicate the issue, and I could. That's why I > put the output here. The issue of Slurm being able to go over the > GrpTRESMins=billing limit was brought to our attention *before* I added my > username to the account. I could try with another previously established > user, if that might help? > > How often does slurmctld get usage information for itself to keep things in > sync and maintain those limits (if it does so at all)? I'll try to clarify using again the definitions of usage that we discussed in bug 15375 comment 3: - slurmctld is all the time computing its usage, the C definition. This is part of the scheduling logic. - slurmdbd computes the every hour its usage, the D definition. This is the rollup process. - the usage in slurmctd (the C definition) is the only one used to enforce limits. - the usage shown in sreport (the D definition) is the only shown by sreport and it's not used to enforce limits. Does it make sense to you too? Regards, Albert (In reply to Albert Gil from comment #46) Hi Albert, > I understand, but actually it could happen is several cases. > The reason is that, using "the 4 definitions of usage" described in bug > 15375 comment 3, you think that the usage of the account is the value D), > but slurmctld is using C). > > And, for some reason, in your case those values are really-really different. > In your case D is 67923028352, but C is almost 0. > That is, slurmctld has almost no usage charged into that account while you > think that the usage is really big. > > Therefore, the question is not about a limit not being respected, but why C > and D are so different in your case. Do you agree? Yes. After re-reading this ticket, and bug 15375, where you outlined scenarios A-D I think we are getting it. Some colleagues and I just wrapped up a review of this ticket, and it's starting to come together. > So, the key is why sreport is reporting such a big usage right? > The most typical case are reservations. > Could it be your case too? Yes, this is part of our inquiry. However, we know right now that the accounts that we viewed as being impacted by this were not a part of any reservation. > I'm wondering if we could just ignore for a moment the differences of > sreport and slurmctd usage, and maybe you can just us the actual slurmctld > value? > You can gather the current limit and usage with both "sshare" (GrpTRESMins > is the limit and GrpTRESRaw is the usage), or with "scontrol show assoc_mgr" > (you'll see lines with the format "GrpTRESMins=...,billing=limit(usage)"). > > The main difference is that, unlike with sreport, you can only query > *current* usage. > You cannot query the usage that the account did in a past period of time. > But, in your case, I think that maybe that's not that bad? Yes, we can ignore it. Gladly, actually :) We are open to using sshare to query the usage information (and to then set GrpTRESMins=billing). However, we do have some questions: You stated “The main difference is that, unlike with sreport, you can only query *current* usage.“ What does "current usage" mean here? When we set our limits we do so, most frequently, on a monthly and yearly basis. How long does the sshare data stay around for so we can reliably query it? > I'll try to clarify using again the definitions of usage that we discussed > in bug 15375 comment 3: > - slurmctld is all the time computing its usage, the C definition. This is > part of the scheduling logic. > - slurmdbd computes the every hour its usage, the D definition. This is the > rollup process. > - the usage in slurmctd (the C definition) is the only one used to enforce > limits. > - the usage shown in sreport (the D definition) is the only shown by sreport > and it's not used to enforce limits. > > Does it make sense to you too? I've gone back and referenced this A-D scenario layout quite a bit and I think it's finally hitting home; I HOPE so, at any rate! Since one sets a limit using sacctmgr, how is that information communicated to slurmctld, if at all? I know I had asked how slurmctld and slurmdbd stay in sync, but it seems that (based on many reads of this ticket and bug 15375) that they are independent of one another. I suppose the question I am asking is: when I set a limit using sacctmgr how can I know that slurmctld will also have that same limit, as there is no other way to set GrpTRESMins=billing limits on an account. Now that we know where to get better information (sshare) to inform our limits, we still want to make sure that the entity doing the enforcing of said limits (slurmctld) has the correct value the user asked for. I had wrongly assumed that, because I set it with sacctmgr it lived solely in the DB and that slurmctld queried it for that information. In the course of discussing this ticket, we discovered that the one cluster (on which the robertwo0 account resides) DID have PriorityDecayHalfLife set accidentally! Is there any way to have the slurmctld data updated|synced to reflect the non-decayed data from slurmdbd? David Hi David, > Yes. After re-reading this ticket, and bug 15375, where you outlined > scenarios A-D I think we are getting > it. Some colleagues and I just wrapped up a review of this ticket, and it's > starting to come together. I'm glad and sorry to hear that. I'm glad this is starting to make sense for you, I'm sorry if I'm not being explaining it good enough. > Yes, this is part of our inquiry. However, we know right now that the > accounts that we viewed > as being impacted by this were not a part of any reservation. Ok. > In the course of discussing this ticket, we discovered that the one cluster > (on which the robertwo0 account resides) DID have PriorityDecayHalfLife set > accidentally! This is a good clue! BUT, Actually, the usage in slurmctld (C) is totally independent for each cluster. Actually, it's a usage tied to "an association", which is a combination of cluster-account-user. That is, the usage of a user UsrA in the account AcctX has an independent value of the same UsrA in another account AcctY. The slurmctld has internally two different "associations", and tracks them totally independent. So, my guess is that in some moment of the PriorityDecayHalfLife was also set in the cluster you are looking at, or something similar. > Is there any way to have the slurmctld data updated|synced to > reflect the non-decayed data from slurmdbd? Unfortunatelly that's not possible. The only way to kind of "setting" a usage is with "sacctmgr update account/user xxx set RawUsage=0". But note that this only allows to reset the usage to 0, not to any other value. More info: - https://slurm.schedmd.com/sacctmgr.html#OPT_RawUsage (for accounts) - https://slurm.schedmd.com/sacctmgr.html#OPT_RawUsage_2 (for users) - https://slurm.schedmd.com/sacctmgr.html#OPT_RawUsage_1 (even for QoSes) > Yes, we can ignore it. Gladly, actually :) We are open to using sshare to > query > the usage information (and to then set GrpTRESMins=billing). Great! > However, we do > have > some questions: > > You stated “The main difference is that, unlike with sreport, you can only > query *current* usage.“ What does > "current usage" mean here? When we set our limits we do so, most frequently, > on a monthly and yearly basis. > How long does the sshare data stay around for so we can reliably query it? It's not a matter of a deadline of data being around, but about the type of queries that you can do in terms of "a time window". That is, with sreport (or sacct) you can specify the Start / End time to gather the usage (D) that users or accounts did. That's not the type of query that you do with sshare (or scontrol show assoc_mgr). That is, with sshare you get the usage (C) that slurmctld has currently registered. You cannot query what usage (C) slurmctld had yesterday, or even a minute ago. So, neither between a custom time window. You can only query the current value. Does it make sense? > I've gone back and referenced this A-D scenario layout quite a bit > and I think it's finally hitting home; I HOPE so, at any rate! Good! :-)) > Since one sets a limit using sacctmgr, how is that information communicated > to slurmctld, if at all? I know I had asked how slurmctld and slurmdbd stay > in sync, but it seems that (based on many reads of this ticket and bug > 15375) that they are independent of one another. I think that here you are probably mixing USAGE and LIMITS. The Limits are exactly the same, they don't have the A-D difference that we have for Usage. Actually, the Limits are always compared against the C Usage. Not against D Usage like you were doing manually. More in deep, every time that you create/modify/delete an account, user, qos.. and any limits on them you do it with sacctmgr, and that info is saved in the DB and sent to slurmctld. But note you always set Limits, not Usage. (Except for the RawUsage=0 mentioned above). > I suppose the question I am > asking is: when I set a limit using sacctmgr how can I know that slurmctld > will also have that same limit, as there is no other way to set > GrpTRESMins=billing limits on an account. Now that we know where to get > better information (sshare) to inform our limits, we still want to make sure > that the entity doing the enforcing of said limits (slurmctld) has the > correct value the user asked for. Actually, sshare can show both: - GrpTRESMins: The limit set (in sacctmgr and sync in slurmctld) - GrpTRESRaw: The (current) usage tracked by slurmctld (C). Note that sshare only gathers information from slurmctld, the one being used to enforce limits. > I had wrongly assumed that, because I set > it with sacctmgr it lived solely in the DB and that slurmctld queried it for > that information. All values that you set in sacctmgr are sent to slurmctd. Actually slurmctld has its own cached information with all that it needs to operate without slurmdbd being run: the values/config of accounts, users, qoes... That is, slurmctld is NOT quering the values from sacctmgr every time that it needs to get a value (that would be a performance killer), but gathers the last values everytime it's started, and when values are modified with sacctmgr. Both daemons have a system to stay sync in both directions. For example, slurmctld is the one that knows about jobs, so it sends jobs info to slurmdbd to be able to query them with sacct. It purges the job once the job finishes and the info is saved to the DB. Note that the "scontrol show assoc_mgr" is actually the way to see the whole cached data that slurmctld has from the DB. In particular for your case, you'll see that for each account/user it will show a line like: GrpTRESMins=....,billing=lll(uuu) Where "lll" is the limit set/cached, and "uuu" is the usage (C). Hope this makes sense, it's not too much information and I'm not confusing you even more! ;-) Albert Good day, Albert! > I'm glad and sorry to hear that. > I'm glad this is starting to make sense for you, I'm sorry if I'm not being > explaining it good enough. I think it was more that I needed to let go of some of my preconceived notions understandings more than anything else; you've done excellent! > This is a good clue! > BUT, > Actually, the usage in slurmctld (C) is totally independent for each cluster. > Actually, it's a usage tied to "an association", which is a combination of > cluster-account-user. > That is, the usage of a user UsrA in the account AcctX has an independent > value of the same UsrA in another account AcctY. > The slurmctld has internally two different "associations", and tracks them > totally independent. > > So, my guess is that in some moment of the PriorityDecayHalfLife was also > set in the cluster you are looking at, or something similar. Thanks for clarifying here. I probably did a dis-service by mixing data/examples from our clusters as they seemed to give evidence to what we saw as the "limit violation" scenario. > Unfortunatelly that's not possible. > The only way to kind of "setting" a usage is with "sacctmgr update > account/user xxx set RawUsage=0". > But note that this only allows to reset the usage to 0, not to any other > value. > More info: > - https://slurm.schedmd.com/sacctmgr.html#OPT_RawUsage (for accounts) > - https://slurm.schedmd.com/sacctmgr.html#OPT_RawUsage_2 (for users) > > - https://slurm.schedmd.com/sacctmgr.html#OPT_RawUsage_1 (even for QoSes) Thanks. We had a hunch that might be the case, but never hurts to ask :) > It's not a matter of a deadline of data being around, but about the type of > queries that you can do in terms of "a time window". > That is, with sreport (or sacct) you can specify the Start / End time to > gather the usage (D) that users or accounts did. > That's not the type of query that you do with sshare (or scontrol show > assoc_mgr). > That is, with sshare you get the usage (C) that slurmctld has currently > registered. > You cannot query what usage (C) slurmctld had yesterday, or even a minute > ago. > So, neither between a custom time window. You can only query the current > value. > > Does it make sense? Yes, I believe so. I knew we didn't have the time windows with sshare, I was more or less thinking of the mechanics and how it meets our design case. In our monthly limit setting example, then, the value that slurmctld shows for usage at the time the script runs is the value that we get at that moment in time. > All values that you set in sacctmgr are sent to slurmctd. > Actually slurmctld has its own cached information with all that it needs to > operate without slurmdbd being run: the values/config of accounts, users, > qoes... > That is, slurmctld is NOT quering the values from sacctmgr every time that > it needs to get a value (that would be a performance killer), but gathers > the last values everytime it's started, and when values are modified with > sacctmgr. > Both daemons have a system to stay sync in both directions. > For example, slurmctld is the one that knows about jobs, so it sends jobs > info to slurmdbd to be able to query them with sacct. > It purges the job once the job finishes and the info is saved to the DB. > > Note that the "scontrol show assoc_mgr" is actually the way to see the whole > cached data that slurmctld has from the DB. > In particular for your case, you'll see that for each account/user it will > show a line like: > > GrpTRESMins=....,billing=lll(uuu) > > Where "lll" is the limit set/cached, and "uuu" is the usage (C). Perfect. So slurmctld is caching; that's good to have confirmed. When you say "but gathers the last values everytime it's started" is this to mean that when the slurmctld daemon is started|restarted? > Hope this makes sense, it's not too much information and I'm not confusing > you even more! ;-) I think it's finally traveled through my thick skull to my brain! XD Thanks, again! David Hi David! > I think it was more that I needed to let go of some of my preconceived > notions > understandings more than anything else; you've done excellent! Great! :-) > Thanks for clarifying here. I probably did a dis-service by mixing > data/examples from > our clusters as they seemed to give evidence to what we saw as the "limit > violation" > scenario. You are not the first one trying to compare sreport values with limits or with sacct queries... I totally understand your thinking! > Yes, I believe so. I knew we didn't have the time windows with sshare, I was > more or less thinking of the mechanics and how it meets our design case. > In our monthly limit setting example, then, the value that slurmctld shows > for usage at the time the script runs is the value that we get at that moment > in time. Exactly. One idea could be to query that value in a monthly (or weekly or daily) basis and save it somewhere accessible from your scripts, and then compare with last value(s) you saved, maybe with a timestamp to get the usage in some time-windows. So, you could add that time-window feature otside Slurm. > Perfect. So slurmctld is caching; that's good to have confirmed. Yes! > When you say > "but gathers the last values everytime it's started" is this to mean that > when > the slurmctld daemon is started|restarted? Yes, and also when slurmdbd is restarted to. Actually, everytime their connection is broken and restablished. They are all the time contacting and checking each other, caching their info and if they can not share/send and update with the other, they also cache those update messages util they can catch up again to sync up. There is quite amount of logic there in the code to ensure that. > > Hope this makes sense, it's not too much information and I'm not confusing > > you even more! ;-) > > I think it's finally traveled through my thick skull to my brain! XD I guess that sometimes Slurm may crush some skulls.. ;-)) Regards, Albert Hi David, Do you think that we can close this ticket as infogiven? With the new information/understanding I see that you may need further support to addapt your polices, but maybe it's better if we open a new fresh ticket once you need it. What do you think? Regards, Albert Good day, Albert!
> Do you think that we can close this ticket as infogiven?
> With the new information/understanding I see that you may need further
> support to addapt your polices, but maybe it's better if we open a new fresh
> ticket once you need it.
> What do you think?
Yes. That sounds great! I'll mark it as such now since I'm already editing. Thanks for your perseverance in this!
David
|
Created attachment 23410 [details] slurm.conf Hello, We have a case where we've set a `GrpTRESMins=billing` limit on an account that was then surpassed/exceeded. Here is the usage for the account from the open of the cluster up to 2022-01: ``` [root@gl-build ~]# date; sreport -nP -T billing cluster AccountUtilizationByUser Accounts=sglotzer3 Start=2020-01-06T12:00:00 End=2022-01-01T00:00:00 format=account,login,used Thu Feb 10 11:18:25 EST 2022 sglotzer3||4179214975 sglotzer3|alacour|1917833791 sglotzer3|alaink|228123762 sglotzer3|dfijan|1326464 sglotzer3|schwendp|1856654607 sglotzer3|thiv|21032042 sglotzer3|ttdwyer|358834 sglotzer3|wzygmunt|74487393 sglotzer3|yuanzhou|79017473 sglotzer3|zhoupj|380608 ``` On 2022-01-01 we set the `GrpTRESMins=billing` by taking all prior usage above (4179214975) and adding 1500000000 to that, meaning that they had a new limit of 5679214975 (meaning they now had a 1500000000 buffer to run up to). The `sacctmgr` transactions shows that event (all transactions for this account are attached) happening: ``` 2022-01-01T04:01:16|Modify Associations|root|(id_assoc=8687)|grp_tres_mins = if (id_assoc=8687, '5=5679214975', grp_tres_mins) ``` Using `sreport` to look at the month of usage for 2022-01 we see the clear violation (1574520951 > 1500000000): ``` [root@gl-build ~]# date; sreport -nP -T billing cluster AccountUtilizationByUser Accounts=sglotzer3 Start=2022-01-01T00:00:00 End=2022-02-01T00:00:00 format=account,login,used Thu Feb 10 11:23:44 EST 2022 sglotzer3||1574520951 sglotzer3|alacour|270089 sglotzer3|alaink|88546051 sglotzer3|schwendp|1485704812 ``` Continuing to use `sreport` we see that they hit their limit roughly a little before noon on 2022-01-15: ``` [root@gl-build ~]# date; sreport -nP -T billing cluster AccountUtilizationByUser Accounts=sglotzer3 Start=2022-01-01T00:00:00 End=2022-01-15T12:00:00 format=account,login,used Thu Feb 10 11:26:25 EST 2022 sglotzer3||1502070314 sglotzer3|alacour|270089 sglotzer3|alaink|72118866 sglotzer3|schwendp|1429681360 ``` Yet job data (attached), and sreport above, shows that they were able to continue to submit/run work well after the time when jobs should have been held with `AssocGrpBillingMinutes`. We are working to understand how this can happen and wanted to bring it to your attention. But we also wanted your input on what should we be looking at when this happens? I.e. Where should we be looking for context when this happens? E.g. Which log files? Should we enable a certain level of logging in our configs? In other words, how can we better capture data to give to you to help us assess this problem? Regards, David