Ticket 7375 - Fix uses of acct_policy_set_qos_order()
Summary: Fix uses of acct_policy_set_qos_order()
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 19.05.x
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Marshall Garey
QA Contact: Brian Christiansen
URL:
: 6659 9788 10745 11475 (view as ticket list)
Depends on:
Blocks:
 
Reported: 2019-07-08 16:10 MDT by Marshall Garey
Modified: 2021-07-27 12:48 MDT (History)
10 users (show)

See Also:
Site: SchedMD
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 21.08.0pre1
Target Release: 20.11
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Comment 3 Marshall Garey 2020-06-02 09:28:21 MDT
There are some problems in accounting when a job is submitted to multiple partitions, where one or more partitions have a QOS. In the code we call acct_policy_set_qos_order() with two QOS but we're assuming that there are only up to two QOS - the job's QOS and the partition's (singular) QOS. But in reality there can be as many QOS as partitions plus one for the job.

Bug 6659 is one example of this which reported problems with accrue_cnt. Below is another example of this problem in acct_policy_validate().

Rather than fix each problem individually, we're going to address all the problems at once. I'm making this bug public and marking bug 6659 as a duplicate of this bug. I'm also marking this ticket as an enhancement. We're targeting these fixes/changes for 20.11.



Example of this problem in acct_policy_validate():


PartitionName=debug State=UP Nodes=snowflake[0-5] qos=test
PartitionName=debug2 State=UP Nodes=snowflake[6-10] qos=test2

sacctmgr mod qos test2 set maxsubmit=1

one terminal...
salloc -pdebug2
salloc: Granted job allocation 93267

other terminal...
salloc -pdebug,debug2 -wsnowflake7
salloc: Granted job allocation 93266

(Clearly this is wrong as we are clearly running in debug2 on both jobs...
squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
             93267    debug2     bash       da  R       1:37      1 snowflake6 
             93266    debug2     bash       da  R       4:59      1 snowflake7 
)

Here we now look at what we see in the slurmctld...

scontrol show assoc flag=qos
QOS=normal(1)
...
   User Limits
      7558
        MaxJobsPU=N(2) MaxJobsAccruePU=N(0) MaxSubmitJobsPU=N(2)
...
QOS=test(958)
...
    User Limits
      7558
        MaxJobsPU=N(0) MaxJobsAccruePU=N(0) MaxSubmitJobsPU=N(0)
...
QOS=test2(959)
...
    User Limits
      7558
        MaxJobsPU=N(2) MaxJobsAccruePU=N(0) MaxSubmitJobsPU=1(1)

So, where is that second job accounted for in test2?  No idea.  We don't account for it in 'test' either.
Comment 4 Marshall Garey 2020-06-02 09:32:41 MDT
*** Ticket 6659 has been marked as a duplicate of this ticket. ***
Comment 9 Trey Dockendorf 2020-09-02 09:30:49 MDT
This is affecting Ohio Supercomputer Center on version 20.02.4. If this could get patched in the 20.02 series that would really help us out.
Comment 10 Marshall Garey 2020-10-06 17:06:31 MDT
*** Ticket 9788 has been marked as a duplicate of this ticket. ***
Comment 11 Marshall Garey 2020-10-06 17:21:31 MDT
I mentioned this on a duplicate bug but I'll mention it here. Right now I'm targeting a fix for 20.11 (hopefully I'll get it done before 20.11 is released), but I could probably provide a patch for testing on 20.02 when it's ready.
Comment 14 Marshall Garey 2021-01-29 15:25:13 MST
*** Ticket 10745 has been marked as a duplicate of this ticket. ***
Comment 23 Marshall Garey 2021-06-08 15:21:40 MDT
*** Ticket 11475 has been marked as a duplicate of this ticket. ***
Comment 30 Marshall Garey 2021-07-27 12:48:06 MDT
To all the sites looking at this bug -

We pushed two commits to fix this issue.

47e46a45e6 Do not use accrue limits for partition QOS

There were issues with accrue limits for partition QOS when submitting jobs to multiple partitions. There wasn't a good way to fix this for multiple partitions, so we made it so accrue limits don't work on partition QOS at all. Accrue limits do still work on job QOS. This is a feature change.

9125409e12 Fix acct_policy_validate() to consider all partition QOS

This ensures that we loop through all partitions when validating a job at job submission time.


These have been pushed to master and will be part of the 21.08 release.