| Summary: | Limiting GPUs per Account | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Will French <will> |
| Component: | Limits | Assignee: | Alejandro Sanchez <alex> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | davide.vanzo |
| Version: | 15.08.11 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Vanderbilt | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
slurm.conf
sacctmgr show assoc tree sacctmgr show qos |
||
|
Description
Will French
2016-05-23 05:16:40 MDT
Created attachment 3127 [details]
slurm.conf
Hi Will. We're taking a look at it and will come back to you. (In reply to Alejandro Sanchez from comment #3) > Hi Will. We're taking a look at it and will come back to you. Thanks, Alejandro. We have users waiting to run on our GPU nodes and we'd like to have the configuration nailed down before allowing jobs to run. Hi Will. I can reproduce this on 15.08.11, but tried with 16.05.0rc2 and it works. In the newer version, when GrpTRES is exceeded jobs remain PD with AssocGrpGRES reason. I believe the exact commit where this was fixed is this one: https://github.com/SchedMD/slurm/commit/0cd692967b In fact, this bug is a duplicate of bug #2482, where GRES TRES were just enforced with granularity as gpu:4 and not the so fine grained gpu:maxwell:4. I believe we should mark this bug as duplicate and close it, unless you have any more questions. Please, let me know what do you think. (In reply to Alejandro Sanchez from comment #7) > Hi Will. I can reproduce this on 15.08.11, but tried with 16.05.0rc2 and it > works. In the newer version, when GrpTRES is exceeded jobs remain PD with > AssocGrpGRES reason. I believe the exact commit where this was fixed is this > one: > > https://github.com/SchedMD/slurm/commit/0cd692967b > > In fact, this bug is a duplicate of bug #2482, where GRES TRES were just > enforced with granularity as gpu:4 and not the so fine grained > gpu:maxwell:4. I believe we should mark this bug as duplicate and close it, > unless you have any more questions. Please, let me know what do you think. Thanks, Alejandro. We probably won't transition to 16.05.XX until the end of the summer. In the meantime, what would you suggest? I'm thinking through the options and here's what I've come up with: 1. Apply patch 2. Remove GRES from fermi partition and apply generic gres/gpu to maxwell partition only 3. Apply generic gres/gpu across both partitions 4. Other option? Is there some sort of QOS or association limits we could apply that could give us the desired behavior? To reiterate, what we're wanting to do is limit the number of Maxwell GPUs a group can access at a given time, but allow a group to use as many Fermi GPUs as they want. > 1. Apply patch The patch depends on other commits done against 16.05 and it can not be directly applied to 15.08.11. Git shows a patch failed error when trying to apply. > 2. Remove GRES from fermi partition and apply generic gres/gpu to maxwell > partition only I think we can preserve generic gres/gpu across both partitions. > 3. Apply generic gres/gpu across both partitions This is the option that I like the most. Let's see if this workaround works for you: Cange GRES parameter for NodeNames in both partitions so that Gres=gpu:4. For partition maxwell, also add a QOS=maxwell and create a new qos named 'maxwell' with: $ sacctmgr create qos maxwell GrpTRES=gres/gpu=4 The default is that partition QOS limits overrides job's qos limits, so no need to change OverPartQOS. Remove the Type parameter in gres.conf too. I tested and now reason PD jobs in maxwell show (QOSGrpGRES) reason and I have no limits on fermi. > 4. Other option? Is there some sort of QOS or association limits we could > apply that could give us the desired behavior? > > To reiterate, what we're wanting to do is limit the number of Maxwell GPUs a > group can access at a given time, but allow a group to use as many Fermi > GPUs as they want. I think the approach in point 3 is a good alternative. Please, let me know whether this works for you or no.
> This is the option that I like the most. Let's see if this workaround works
> for you:
>
> Cange GRES parameter for NodeNames in both partitions so that Gres=gpu:4.
> For partition maxwell, also add a QOS=maxwell and create a new qos named
> 'maxwell' with:
>
> $ sacctmgr create qos maxwell GrpTRES=gres/gpu=4
>
> The default is that partition QOS limits overrides job's qos limits, so no
> need to change OverPartQOS.
>
> Remove the Type parameter in gres.conf too. I tested and now reason PD jobs
> in maxwell show (QOSGrpGRES) reason and I have no limits on fermi.
>
We've made these changes and they work great - thanks!
One last snag - we want to be able to control the number of Maxwell GPUs that are accessible on a group-by-group basis. Actually, in practice we only want two levels for now: two groups should be allowed to use 20 Maxwell GPUs all at once, while all other groups should be allowed to use only 4 Maxwell GPUs at once.
I was hoping I could just set a GrpTRES=gres/gpu=20 on the two groups in combination with the OverPartQOS flag set on the maxwell QOS, but that does not appear to work.
Can you explain how we might accomplish this? We also have not played with QOS's much, we tend to set account-level limits instead.
Thanks!
Will
Will, could you try this? # Remove OverPartQos flag from maxwell $ sacctmgr modify qos maxwell set flags=-1 # Create a second QOS $ sacctmgr add qos maxwell_20 GrpTRES=gres/gpu=20 Flags=OverPartQos # Add the QOS maxwell_20 to the accounts that should be allowed 20 maxwell GPUs $ sacctmgr modify account <allowed_acct> set qos=<other_qos_they_had>,maxwell_20 Now accounts with with maxwell_20 qos should be allowed to have 20 running maxwell GPUs if they request the job with --qos=maxwell_20. You can either train that groups to request the jobs that way or force it through a job_submit plugin by detecting comparing the job request partition=maxwell, account=<one of the allowed accounts> and in that case force --qos=maxwell_20 for the jobs satisfying that. I tried to setup this scenario and it works for me. Please, let me know if this also works for you. (In reply to Alejandro Sanchez from comment #11) > Will, could you try this? > > # Remove OverPartQos flag from maxwell > $ sacctmgr modify qos maxwell set flags=-1 > > # Create a second QOS > $ sacctmgr add qos maxwell_20 GrpTRES=gres/gpu=20 Flags=OverPartQos > > # Add the QOS maxwell_20 to the accounts that should be allowed 20 maxwell > GPUs > $ sacctmgr modify account <allowed_acct> set > qos=<other_qos_they_had>,maxwell_20 > > Now accounts with with maxwell_20 qos should be allowed to have 20 running > maxwell GPUs if they request the job with --qos=maxwell_20. You can either > train that groups to request the jobs that way or force it through a > job_submit plugin by detecting comparing the job request partition=maxwell, > account=<one of the allowed accounts> and in that case force > --qos=maxwell_20 for the jobs satisfying that. I tried to setup this > scenario and it works for me. Please, let me know if this also works for you. Hey Alejandro, this is working well. In fact, it appears that --qos=maxwell_20 is not even needed at submit time. When I run: sacctmgr modify account accre_gpu set qos=maxwell_20 the account has the qos tied to it, and all the associations with this account also have this qos applied automatically. One thing I'm still failing to understand with qos's: How do you limit a qos to a group of users? For example, if I run: sacctmgr modify account accre_gpu set qos=normal SLURM only allows me to run with up to 4 Maxwell GPUs at once because of the GrpTRES limit placed on the maxwell partition. However, I am able to run on up to 20 Maxwell GPUs if I submit jobs with --qos=maxwell_20. Is there a way to limit access to the qos to only those groups and users who have the qos assigned to their association? Thanks again, Will > Hey Alejandro, this is working well. In fact, it appears that > --qos=maxwell_20 is not even needed at submit time. When I run: > > sacctmgr modify account accre_gpu set qos=maxwell_20 > > the account has the qos tied to it, and all the associations with this > account also have this qos applied automatically. This is not needed unless the assocs also have other qos applied. In that case the job might be launched with a different qos and thus not having the limit at 20 gpus. But if assoc has only 1 qos, you're right that there's no need for explicit --qos=maxwell_20 parameter at request time. > > One thing I'm still failing to understand with qos's: > > How do you limit a qos to a group of users? For example, if I run: > > sacctmgr modify account accre_gpu set qos=normal > > SLURM only allows me to run with up to 4 Maxwell GPUs at once because of the > GrpTRES limit placed on the maxwell partition. However, I am able to run on > up to 20 Maxwell GPUs if I submit jobs with --qos=maxwell_20. Is there a way > to limit access to the qos to only those groups and users who have the qos > assigned to their association? > > Thanks again, Will Slurm should reject submissions with --qos=maxwell_20 if submitted by an account in an assoc not having maxwell_20 qos: $ sbatch --qos=maxwell_20 test.batch sbatch: error: Batch job submission failed: Invalid qos specification Can you attach the output of: $ sacctmgr show assoc tree $ sacctmgr show qos Thanks. > Slurm should reject submissions with --qos=maxwell_20 if submitted by an > account in an assoc not having maxwell_20 qos: > > $ sbatch --qos=maxwell_20 test.batch > sbatch: error: Batch job submission failed: Invalid qos specification For some reason I'm allowed. I thought maybe it was because I have administrative privileges in SLURM but even when I removed these I'm still allowed to submit jobs with maxwell_20 qos when none of my accounts or associations have this qos: $ salloc --qos=maxwell_20 --partition=maxwell --account=accre_gpu salloc: Granted job allocation 8823505 $ sacctmgr show user frenchwr User Def Acct Admin ---------- ---------- --------- frenchwr accre None > > Can you attach the output of: > > $ sacctmgr show assoc tree > $ sacctmgr show qos Attaching. Thanks. Created attachment 3157 [details]
sacctmgr show assoc tree
Created attachment 3158 [details]
sacctmgr show qos
I think the solution is appending 'qos' to AccountingStorageEnforce. I see you have 'safe', which automatically sets 'limits' and 'associations', but not 'qos'. Note that a change in AccountingStorageEnforce requires a restart on slurmctld daemon, not just 'scontrol reconfigure'. Also I see you've not added QOS 'maxwell_20' and 'maxwell_40' to any account. After updating AccountingStorageEnforce you should add these qos to the desired accounts. (In reply to Alejandro Sanchez from comment #17) > I think the solution is appending 'qos' to AccountingStorageEnforce. I see > you have 'safe', which automatically sets 'limits' and 'associations', but > not 'qos'. Note that a change in AccountingStorageEnforce requires a restart > on slurmctld daemon, not just 'scontrol reconfigure'. Yep, this was the issue. We are now all set and we have the ability to enforce qos requests and GPU limits from SLURM. Woot! > Also I see you've not added QOS 'maxwell_20' and 'maxwell_40' to any > account. After updating AccountingStorageEnforce you should add these qos to > the desired accounts. Right, I had temporarily all accounts from QOS while I was testing whether groups not pinned to a qos could still submit to that qos. We're happy on our end, feel free to close this ticket. Many thanks for your assistance! We look forward to the 16.05 release, that will make management of multiple GRES cleaner. Great, thanks for your cooperation. Closing the bug. |