We use a declining balance "charge" method for Slurm using the "billing" value of GrpTRESMins. Since we use decaying usage with fairshare we track this is QOS. Each account has a corresponding QOS and we set a limit on the billing value for GrpTRESMins on each QOS. From time to time we need to issue a "refund" and we expected we be able to do this by adjusting the usageraw value on the QOS, since usageraw is, as I understand it, the tracked billing x mins converted to seconds. However when we adjust the usageraw for a QOS it has no effect on the tracked billing value in GrpTRESMins. Is this expected? If this expected, we'd like to request that these values be associated, so that when UsageRaw is set the tracked billing value of GrpTRESMins is also set. Or that we be given a way to directly set the tracked billing value of GrpTRESMins. Thanks!
(In reply to Jake Rundall from comment #0) > We use a declining balance "charge" method for Slurm using the "billing" > value of GrpTRESMins. Since we use decaying usage with fairshare we track > this is QOS. Each account has a corresponding QOS and we set a limit on the > billing value for GrpTRESMins on each QOS. > > From time to time we need to issue a "refund" and we expected we be able to > do this by adjusting the usageraw value on the QOS, since usageraw is, as I > understand it, the tracked billing x mins converted to seconds. However when > we adjust the usageraw for a QOS it has no effect on the tracked billing > value in GrpTRESMins. Is this expected? > > If this expected, we'd like to request that these values be associated, so > that when UsageRaw is set the tracked billing value of GrpTRESMins is also > set. Or that we be given a way to directly set the tracked billing value of > GrpTRESMins. > > Thanks! Hello Jake! Yes, this is expected. "usageraw" and GrpTRESMins in accounting are calculated based on completed jobs. A possible workaround would be to temporarily add the refunded value back to the GrpTRESMins limit: sacctmgr modify qos <qos_name> set GrpTRESMins=billing=<current value + refund_value> I know this isn't ideal. As far as the change request - I'll need to look into it and let you know.
Thanks, Brian. Yep, the ability to set the billing value for GrpTRESMins on a QOS would be very helpful to us. Out of curiosity, what (if anything) would usageraw be used for on a QOS? Are there any features in Slurm that use that? Or is it simply a number for tracking?
(In reply to Jake Rundall from comment #3) > Thanks, Brian. Yep, the ability to set the billing value for GrpTRESMins on > a QOS would be very helpful to us. > > Out of curiosity, what (if anything) would usageraw be used for on a QOS? > Are there any features in Slurm that use that? Or is it simply a number for > tracking? It's just one of many accounting values for tracking.
(In reply to Jake Rundall from comment #3) > Thanks, Brian. Yep, the ability to set the billing value for GrpTRESMins on > a QOS would be very helpful to us. > > Out of curiosity, what (if anything) would usageraw be used for on a QOS? > Are there any features in Slurm that use that? Or is it simply a number for > tracking? Are you still interested in pursuing the feature request?
Yes, we're interested in having the ability to set (to any value) the billing value for GrpTRESMins on a QOS. Or for this value to be linked to UsageRaw, if that makes sense, such that when we set UsageRaw this other value also gets set. We can use billing from GrpTRESMins to apply a limit, unlike with UsageRaw. Thanks!
(In reply to Jake Rundall from comment #6) > Yes, we're interested in having the ability to set (to any value) the > billing value for GrpTRESMins on a QOS. Or for this value to be linked to > UsageRaw, if that makes sense, such that when we set UsageRaw this other > value also gets set. We can use billing from GrpTRESMins to apply a limit, > unlike with UsageRaw. Thanks! As mentioned above, you can already set the billing value for a QOS with this command: sacctmgr modify qos <qos_name> set GrpTRESMins=billing=<value> Perhaps I'm misunderstanding?
Yes, let's be sure we've got a shared understanding. I believe that we want to be able to set the *tracked" value for billing under GrpTRESMins for a QOS. We rely on the ability to set limits on the billing value from GrpTRESMins on QOSes. My understanding is that the *tracked usage* for GrpTRESMins billing can be seen by running something like this: [root@dt-sched ~]# scontrol show assoc flags=qos qos=bbka-delta-gpu | grep GrpTRESMins GrpTRESMins=cpu=N(4359108),mem=N(16883292961),energy=N(0),node=N(131724),billing=300000000000(460177818),fs/disk=N(0),vmem=N(0),pages=N(0),gres/gpu=N(328106),gres/gpu:a100=N(214291),gres/gpu:a40=N(95975),gres/gpu:mi100=N(15840),gres/gpu:mi210=N(698),gres/gpu:nvidia_a100_1g.10g=N(65),gres/gpu:nvidia_a100_1g.5g=N(10),gres/gpu:nvidia_a100_2g.10g=N(2),gres/gpu:nvidia_a100_3g.20g=N(121),gres/gpu:nvidia_a100_4g.20g=N(9),gres/gpumem=N(0),gres/gpuutil=N(0) In this case, we've set a limit of 300000000000 and have *tracked usage* of 460177818. And if the *tracked usage* hits 300000000000, or a job is submitted where the resources requested and walltime requested would lead to hitting that limit of 300000000000, the job will not start. (We have the UsageFactorSafe and NoDecay flags set on the QOSes, and all of the required flags under AccountingStorageEnforce.) Please correct me if I'm misunderstanding any of the above. Assuming I am understanding everything correctly, we'd like to be able to set that tracked usage value for GrpTRESMins billing to something else, with the intent of effecting a refund. Or to link the tracked billing GrpTRESMins value to RawUsage, such that when we set RawUsage it also sets the tracked usage value for GrpTRESMins billing. It seems those values reflect the same *tracked usage* — one is just in seconds, the other in minutes. We actually assumed they *were* linked, which is why in our original request (17549) we asked for the ability to set RawUsage on a QOS, expecting it would accomplish what we were after. But we've learned this is not the case. Does this all make sense?
(In reply to Jake Rundall from comment #8) > Yes, let's be sure we've got a shared understanding. > > I believe that we want to be able to set the *tracked" value for billing > under GrpTRESMins for a QOS. > > We rely on the ability to set limits on the billing value from GrpTRESMins > on QOSes. > > My understanding is that the *tracked usage* for GrpTRESMins billing can be > seen by running something like this: > [root@dt-sched ~]# scontrol show assoc flags=qos qos=bbka-delta-gpu | grep > GrpTRESMins > GrpTRESMins=cpu=N(4359108),mem=N(16883292961),energy=N(0),node=N(131724), > billing=300000000000(460177818),fs/disk=N(0),vmem=N(0),pages=N(0),gres/ > gpu=N(328106),gres/gpu:a100=N(214291),gres/gpu:a40=N(95975),gres/gpu: > mi100=N(15840),gres/gpu:mi210=N(698),gres/gpu:nvidia_a100_1g.10g=N(65),gres/ > gpu:nvidia_a100_1g.5g=N(10),gres/gpu:nvidia_a100_2g.10g=N(2),gres/gpu: > nvidia_a100_3g.20g=N(121),gres/gpu:nvidia_a100_4g.20g=N(9),gres/gpumem=N(0), > gres/gpuutil=N(0) > > In this case, we've set a limit of 300000000000 and have *tracked usage* of > 460177818. And if the *tracked usage* hits 300000000000, or a job is > submitted where the resources requested and walltime requested would lead to > hitting that limit of 300000000000, the job will not start. (We have the > UsageFactorSafe and NoDecay flags set on the QOSes, and all of the required > flags under AccountingStorageEnforce.) > > Please correct me if I'm misunderstanding any of the above. > > Assuming I am understanding everything correctly, we'd like to be able to > set that tracked usage value for GrpTRESMins billing to something else, with > the intent of effecting a refund. Or to link the tracked billing GrpTRESMins > value to RawUsage, such that when we set RawUsage it also sets the tracked > usage value for GrpTRESMins billing. > > It seems those values reflect the same *tracked usage* — one is just in > seconds, the other in minutes. We actually assumed they *were* linked, which > is why in our original request (17549) we asked for the ability to set > RawUsage on a QOS, expecting it would accomplish what we were after. But > we've learned this is not the case. > > Does this all make sense? Yes, thanks for the detail.
Any thoughts yet about the feasibility of implementing this? Thanks!
(In reply to Jake Rundall from comment #17) > Any thoughts yet about the feasibility of implementing this? Thanks! I apologize for the delay. We're still evaluating this.
A recent event has caused increased need to issue refunds (and it would be much better for us if we could do this by adjusting the mentioned value). So addition to poking about the ask for this capability (any update?) we're also curious if there is some kind of workaround in the meantime. If this is stored in the accounting DB, is there a way to manually adjust this? If it's stored in slurmctld's state, could we stop slurmctld and then modify something in the saved state dir? Thanks!
(In reply to Jake Rundall from comment #19) > A recent event has caused increased need to issue refunds (and it would be > much better for us if we could do this by adjusting the mentioned value). So > addition to poking about the ask for this capability (any update?) we're > also curious if there is some kind of workaround in the meantime. If this is > stored in the accounting DB, is there a way to manually adjust this? If it's > stored in slurmctld's state, could we stop slurmctld and then modify > something in the saved state dir? > > Thanks! The TRESMins are not stored in accounting, but are live values calculated regularly based on jobs as they run or complete. There isn't currently any method to modify these, but you can raise the limit for a qos with this command: sacctmgr modify qos <qos_name> set GrpTRESMins=billing=<current value + refund_value> While this won't change the stats, it will effectively "give back" the refunded billing to the qos in question.
(In reply to Brian Gregory from comment #21) > (In reply to Jake Rundall from comment #19) > > A recent event has caused increased need to issue refunds (and it would be > > much better for us if we could do this by adjusting the mentioned value). So > > addition to poking about the ask for this capability (any update?) we're > > also curious if there is some kind of workaround in the meantime. If this is > > stored in the accounting DB, is there a way to manually adjust this? If it's > > stored in slurmctld's state, could we stop slurmctld and then modify > > something in the saved state dir? > > > > Thanks! > > The TRESMins are not stored in accounting, but are live values calculated > regularly based on jobs as they run or complete. There isn't currently any > method to modify these, but you can raise the limit for a qos with this > command: > > sacctmgr modify qos <qos_name> set GrpTRESMins=billing=<current value + > refund_value> > > While this won't change the stats, it will effectively "give back" the > refunded billing to the qos in question. With respect to the original feature request: This change would have a significant impact on the system and would require considerable effort to implement and test. It affects fair share, decay, accounting, and all associations (not just QoS). It isn't as simple as just changing a value. For now, I would recommend using the above as a workaround.
Thanks, Brian. I do understand how increasing the GrpTRESMins billing value on the QOS would have that effect, but in our environment these numbers are set by external allocations systems. If we change the value in Slurm only it's at risk of getting reset by our allocations integrations. The procedure for doing this properly would have to begin with (as an example) the user submitting a supplemental NSF funding application (which we would certainly support, but it's a lot of overhead for a refund). It would be much preferred for us to decrease the tracked usage. We certainly do appreciate the effort that would be involved in implementing this in Slurm. But it also seems like a feature that would be broadly beneficial as I believe this method of tracking and limiting usage in Slurm is reasonably common. Thanks again!
Hi Jake, After discussing this with folks internally, it has been determined that this chargeback feature would be too costly at this time, and we would be reluctant to pursue it as an NRE.
Ok, thanks for the update. This can be closed.
Closing