Ticket 2242

Summary: add new MaxTRESPerAccount limit
Product: Slurm Reporter: Tim Wickberg <tim>
Component: AccountingAssignee: Danny Auble <da>
Status: RESOLVED FIXED QA Contact:
Severity: 3 - Medium Impact    
Priority: --- CC: hermes, mrg, pedmon, sthiell
Version: 16.05.x   
Hardware: Linux   
OS: Linux   
See Also: http://bugs.schedmd.com/show_bug.cgi?id=1556
Site: FHCRC - Fred Hutchinson Cancer Research Center Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 16.05.0-pre1 Target Release: 16.05
DevPrio: 1 - Paid Emory-Cloud Sites: ---
Attachments: 15.08 patch for max tres functionality per account
fix issues in comment 8
fix issues in comment 8
15.08 patch for max tres functionality per account

Description Tim Wickberg 2015-12-10 04:37:36 MST
Add new MaxTRESPerAccount limit. Treat similarly to existing MaxTRESPerUser, but based on an account rather than a user.
Comment 1 Danny Auble 2016-01-04 10:22:12 MST
I'm hoping to have this done before Valentines day.  It commit will be in 16.05 but I will give you a patch for 15.08.  Please let me know otherwise.
Comment 2 Michael Gutteridge 2016-01-04 10:31:15 MST
This sounds good- I'll be able to use both patch and then figure on an upgrade to 16.05 in the May/June timeframe.
Comment 3 Danny Auble 2016-01-21 09:56:54 MST
Created attachment 2630 [details]
15.08 patch for max tres functionality per account

Michael, attached you will find a patch that will convert your 15.08 install (based off 15.08.7) to use the new MaxTres functionality for accounts added to QOS.  The 3 options added are these...

MaxTRESPerAccount
MaxJobsPerAccount
MaxSubmitJobsPerAccount

Please let me know if you have any questions or issues.  I'll check this into the master branch after you have verified it working.

I will make note if you want to go back to vanilla 15.08 after this you will perhaps have minor hiccups, like the association cache read from the slurmctld upon start will fail, but that isn't a very big deal as it will just get the correct 15.08 information from the reverted slurmdbd.
Comment 4 Michael Gutteridge 2016-01-21 11:10:33 MST
Great!  I'll drop it on the test cluster in the morning. Should have a report for you early next week. 

PS- someone on the list was asking about this feature. Cool if I let him know to expect it in 16.05?

----- bugs@schedmd.com wrote:
> http://bugs.schedmd.com/show_bug.cgi?id=2242
> 
> --- Comment #3 from Danny Auble <da@schedmd.com> ---
> Created attachment 2630 [details]
>   --> http://bugs.schedmd.com/attachment.cgi?id=2630&action=edit
> 15.08 patch for max tres functionality per account
> 
> Michael, attached you will find a patch that will convert your 15.08 install
> (based off 15.08.7) to use the new MaxTres functionality for accounts added to
> QOS.  The 3 options added are these...
> 
> MaxTRESPerAccount
> MaxJobsPerAccount
> MaxSubmitJobsPerAccount
> 
> Please let me know if you have any questions or issues.  I'll check this into
> the master branch after you have verified it working.
> 
> I will make note if you want to go back to vanilla 15.08 after this you will
> perhaps have minor hiccups, like the association cache read from the slurmctld
> upon start will fail, but that isn't a very big deal as it will just get the
> correct 15.08 information from the reverted slurmdbd.
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.
Comment 5 Paul Edmon 2016-01-21 11:56:43 MST
Thanks.  I will put it in our test build here as well.

-Paul Edmon-

On 1/21/2016 8:10 PM, bugs@schedmd.com wrote:
>
> *Comment # 4 <http://bugs.schedmd.com/show_bug.cgi?id=2242#c4> on bug 
> 2242 <http://bugs.schedmd.com/show_bug.cgi?id=2242> from Michael 
> Gutteridge <mailto:mrg@fredhutch.org> *
> Great!  I'll drop it on the test cluster in the morning. Should have a report
> for you early next week.
>
> PS- someone on the list was asking about this feature. Cool if I let him know
> to expect it in 16.05?
>
> -----bugs@schedmd.com <mailto:bugs@schedmd.com>  wrote:
> >http://bugs.schedmd.com/show_bug.cgi?id=2242 <show_bug.cgi?id=2242> > 
> > --- Comment #3 <show_bug.cgi?id=2242#c3> from Danny Auble 
> <da@schedmd.com <mailto:da@schedmd.com>> --- > Created attachment 2630 [details] 
> <attachment.cgi?id=2630&action=diff> [details] 
> <attachment.cgi?id=2630&action=edit> > --> 
> http://bugs.schedmd.com/attachment.cgi?id=2630&action=edit > 15.08 
> patch for max tres functionality per account > > Michael, attached you 
> will find a patch that will convert your 15.08 install > (based off 
> 15.08.7) to use the new MaxTres functionality for accounts added to > 
> QOS. The 3 options added are these... > > MaxTRESPerAccount > 
> MaxJobsPerAccount > MaxSubmitJobsPerAccount > > Please let me know if 
> you have any questions or issues. I'll check this into > the master 
> branch after you have verified it working. > > I will make note if you 
> want to go back to vanilla 15.08 after this you will > perhaps have 
> minor hiccups, like the association cache read from the slurmctld > 
> upon start will fail, but that isn't a very big deal as it will just 
> get the > correct 15.08 information from the reverted slurmdbd. > > -- 
> > You are receiving this mail because: > You are on the CC list for 
> the bug.
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You are on the CC list for the bug.
>
Comment 6 Danny Auble 2016-01-21 12:49:35 MST
Paul's the guy from the list :).  But you can tell him if you would like on the list ;).
Comment 7 Paul Edmon 2016-01-22 01:46:32 MST
Yup, if you want too as some other people might be curious about this 
feature.

-Paul Edmon-

On 01/21/2016 09:49 PM, bugs@schedmd.com wrote:
>
> *Comment # 6 <http://bugs.schedmd.com/show_bug.cgi?id=2242#c6> on bug 
> 2242 <http://bugs.schedmd.com/show_bug.cgi?id=2242> from Danny Auble 
> <mailto:da@schedmd.com> *
> Paul's the guy from the list :).  But you can tell him if you would like on the
> list ;).
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You are on the CC list for the bug.
>
Comment 8 Michael Gutteridge 2016-01-25 10:46:37 MST
Hi

I've successfully built and deployed this patch against 15.08.7 (thought I might do the point upgrade while I'm at it).

only two things I see currently is that:

a) squeue doesn't show the reason correctly

slapshot[~/tutorial]: squeue
               JOBID      USER  ACCOUNT PARTITION QOS      NAME               ST       TIME  NODES CPUS MIN_ NODELIST(REASON)
            27147336       mrg  scicomp    campus normal   sleeper.sh         PD       0:00      1 1    1    (167)
            27147337       mrg  scicomp    campus normal   sleeper.sh         PD       0:00      1 1    1    (Priority)
            27147338       mrg  scicomp    campus normal   sleeper.sh         PD       0:00      1 1    1    (Priority)
            27147335       mrg  scicomp    campus normal   sleeper.sh          R       0:03      1 1    1    gizmof368
            27147334       mrg  scicomp    campus normal   sleeper.sh          R       0:06      1 1    1    gizmof368

b) sacctmgr doesn't show this TRES by default, but will show when specified in "format":

slapshot[~/tutorial]: sacctmgr show qos where name=normal
      Name   Priority  GraceTime    Preempt PreemptMode                                    Flags UsageThres UsageFactor       GrpTRES   GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit     GrpWall       MaxTRES MaxTRESPerNode   MaxTRESMins     MaxWall     MaxTRESPU MaxJobsPU MaxSubmitPU       MinTRES 
---------- ---------- ---------- ---------- ----------- ---------------------------------------- ---------- ----------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- 
    normal          0   00:00:00                cluster                                                        1.000000       cpu=250                                                                                                                        cpu=300                  5000         

slapshot[~/tutorial]: sacctmgr show qos where name=normal format=maxtresperaccount
    MaxTRESPA 
------------- 
        cpu=2 

Looks real good so far, though.  Thanks

M
Comment 9 Danny Auble 2016-01-25 11:35:43 MST
Cool, both should be easy to fix, I'll get you a patch tomorrow.  Let me know if you find anything else. 

On January 25, 2016 4:46:37 PM PST, bugs@schedmd.com wrote:
>http://bugs.schedmd.com/show_bug.cgi?id=2242
>
>--- Comment #8 from Michael Gutteridge <mrg@fredhutch.org> ---
>Hi
>
>I've successfully built and deployed this patch against 15.08.7
>(thought I
>might do the point upgrade while I'm at it).
>
>only two things I see currently is that:
>
>a) squeue doesn't show the reason correctly
>
>slapshot[~/tutorial]: squeue
>         JOBID      USER  ACCOUNT PARTITION QOS      NAME              
>ST       TIME  NODES CPUS MIN_ NODELIST(REASON)
>      27147336       mrg  scicomp    campus normal   sleeper.sh        
>PD       0:00      1 1    1    (167)
>      27147337       mrg  scicomp    campus normal   sleeper.sh        
>PD       0:00      1 1    1    (Priority)
>      27147338       mrg  scicomp    campus normal   sleeper.sh        
>PD       0:00      1 1    1    (Priority)
>     27147335       mrg  scicomp    campus normal   sleeper.sh         
>R       0:03      1 1    1    gizmof368
>     27147334       mrg  scicomp    campus normal   sleeper.sh         
>R       0:06      1 1    1    gizmof368
>
>b) sacctmgr doesn't show this TRES by default, but will show when
>specified in
>"format":
>
>slapshot[~/tutorial]: sacctmgr show qos where name=normal
>Name   Priority  GraceTime    Preempt PreemptMode                      
> 
>           Flags UsageThres UsageFactor       GrpTRES   GrpTRESMins
>GrpTRESRunMin GrpJobs GrpSubmit     GrpWall       MaxTRES
>MaxTRESPerNode  
>MaxTRESMins     MaxWall     MaxTRESPU MaxJobsPU MaxSubmitPU      
>MinTRES 
>---------- ---------- ---------- ---------- -----------
>---------------------------------------- ---------- -----------
>-------------
>------------- ------------- ------- --------- ----------- -------------
>-------------- ------------- ----------- ------------- ---------
>-----------
>------------- 
>normal          0   00:00:00                cluster                    
>   
>                       1.000000       cpu=250                          
>                                                                       
>             cpu=300                  5000         
>
>slapshot[~/tutorial]: sacctmgr show qos where name=normal
>format=maxtresperaccount
>    MaxTRESPA 
>------------- 
>        cpu=2 
>
>Looks real good so far, though.  Thanks
>
>M
>
>-- 
>You are receiving this mail because:
>You are on the CC list for the bug.
>You are watching all bug changes.
Comment 10 Danny Auble 2016-01-26 09:07:55 MST
Created attachment 2642 [details]
fix issues in comment 8

Hey Michael attached you will find a patch that can be used on top of the normal patch that fixes the issues you noticed.  Please let me know if you find anything else.

I'll also update the 15.08 patch after this will a full patch set for future releases so you don't have to patch things multiple times if/when you update to future 15.08 releases.
Comment 11 Danny Auble 2016-01-26 09:13:26 MST
Created attachment 2644 [details]
fix issues in comment 8

Sorry the previous patch was for 16.05, this one is for 15.08.
Comment 12 Danny Auble 2016-01-26 09:15:08 MST
Created attachment 2645 [details]
15.08 patch for max tres functionality per account

This is an updated full patch for 15.08.  attachment 2644 [details] is not needed with this, only with attachment 2630 [details].
Comment 13 Danny Auble 2016-01-28 05:45:42 MST
Michael, if all is well, I would like to push this into the master branch.  Let me know if you have found any other issues before then.

Thanks!
Comment 14 Michael Gutteridge 2016-01-31 23:53:30 MST
Yep- built and installed in test.  Looks to be working as intended, the two problems I'd noticed (reason display and sacctmgr output) appear correct.
Comment 15 Danny Auble 2016-02-01 08:12:05 MST
This has been committed to 16.05 commit be68e87c7493.  If you have any problems please open a new bug for the related topic.
Comment 16 Tim Wickberg 2016-04-28 09:41:01 MDT
*** Ticket 2669 has been marked as a duplicate of this ticket. ***