| Summary: | What are the QOS sacctmgr commands to set the below for all users. | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Bill Pappas <bpappas> |
| Component: | Configuration | Assignee: | Ben Roberts <ben> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | akail |
| Version: | 20.02.6 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Analysis Group | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 20.02.06 | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
Hi Bill,
You should be able to accomplish what you're looking for by setting a limit on the number of GPUs per user and then setting a group limit for the QOS and then associating that QOS with a partition. Setting the MaxTRESPerJob limit is a bit redundant since the per-user limit will effectively enforce a per-job limit as well, but it doesn't hurt to set it as well. I set up an example to demonstrate how to accomplish this.
I started by setting limits on the number of GPUs for MaxTRESPerUser, MaxTRESPerJob and GrpTRES (I used 4, 4 and 8 respectively to make it easier to demonstrate on my test system).
$ sacctmgr show qos limited format=name,maxtrespu,maxtresperjob,grptres
Name MaxTRESPU MaxTRES GrpTRES
---------- ------------- ------------- -------------
limited gres/gpu=4 gres/gpu=4 gres/gpu=8
I associated this QOS with the 'high' partition in my slurm.conf. You can see that the scontrol shows the 'limited' QOS as being associated. This means that users who request this partition will have any limits from the 'limited' QOS applied to their jobs.
$ scontrol show partition high | grep QoS
AllocNodes=ALL Default=NO QoS=limited
As my user I submit two jobs to this partition, each requesting 4 GPUs. The first job is able to start but the second stays Pending because it would violate the MaxTRESPerUser limit.
$ sbatch -phigh --gpus=4 --wrap='srun sleep 60'
Submitted batch job 25782
$ sbatch -phigh --gpus=4 --wrap='srun sleep 60'
Submitted batch job 25783
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
25783 high wrap ben PD 0:00 1 (QOSMaxGRESPerUser)
25782 high wrap ben R 0:01 1 node01
Then I become 'user1' and submitted two more jobs like this. Again, the first job is able to run, but the second is blocked. This time it shows that the jobs are hitting the GrpTRES limit.
$ sbatch -phigh --gpus=4 --wrap='srun sleep 60'
Submitted batch job 25784
$ sbatch -phigh --gpus=4 --wrap='srun sleep 60'
Submitted batch job 25785
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
25785 high wrap user1 PD 0:00 1 (QOSGrpGRES)
25783 high wrap ben PD 0:00 1 (QOSGrpGRES)
25784 high wrap user1 R 0:05 1 node02
25782 high wrap ben R 0:15 1 node01
I become 'user2' and submit a similar job. This isn't able to run until there are fewer than 8 GPUs in use by jobs, even though it hasn't reached the MaxTRESPerUser limit.
$ sbatch -phigh --gpus=4 --wrap='srun sleep 60'
Submitted batch job 25786
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
25786 high wrap user2 PD 0:00 1 (QOSGrpGRES)
25785 high wrap user1 PD 0:00 1 (QOSGrpGRES)
25783 high wrap ben PD 0:00 1 (QOSGrpGRES)
25784 high wrap user1 R 0:16 1 node02
25782 high wrap ben R 0:26 1 node01
Does this look like it will work for what you are trying to do?
Thanks,
Ben
Hi Bill, I wanted to make suer the information I sent on enforcing the limits you described looks like it will work for you. Let me know if you have any additional questions on this or if it's ok to close the ticket. Thanks, Ben Hi Bill, I believe the information I sent should have answered your questions about enforcing limits and I haven't heard any follow up questions. I'll go ahead and close this ticket but feel free to update the ticket if you do have additional questions about this. Thanks, Ben Please close Bill Pappas 901-619-0585<tel:901-619-0585> bpappas@dstonline.com<mailto:bpappas@dstonline.com> [cid:24BE9C0E-3BED-4FEF-AE8A-34AD2F51BA7D] On Mar 19, 2021, at 2:32 PM, bugs@schedmd.com wrote: Ben Roberts<mailto:ben@schedmd.com> changed bug 11007<https://bugs.schedmd.com/show_bug.cgi?id=11007> What Removed Added Resolution --- INFOGIVEN Status OPEN RESOLVED Comment # 3<https://bugs.schedmd.com/show_bug.cgi?id=11007#c3> on bug 11007<https://bugs.schedmd.com/show_bug.cgi?id=11007> from Ben Roberts<mailto:ben@schedmd.com> Hi Bill, I believe the information I sent should have answered your questions about enforcing limits and I haven't heard any follow up questions. I'll go ahead and close this ticket but feel free to update the ticket if you do have additional questions about this. Thanks, Ben ________________________________ You are receiving this mail because: * You reported the bug. Hi Ben, I'm working with Bill on this project and we are unable to verify the settings have taken effect. When we submit jobs using the settings Bill provided and I submit a job requesting over 24 GPU's in total user's are not being limited. For instance, we can submit 50 1 GPU jobs and they all run. I will provide output on that from off hours when I can there is more availability. This afternoon we submitted 30 1 gpu jobs, and several run, but the Pending reason is Priority or Resources, not "QoS". Is there another way we can confirm these limits are working? Thanks, Andrew Hi Andrew,
The limits you set with sacctmgr are stored in the database and the command communicates with slurmdbd. You should be able to see what slurmctld knows about the limits that are defined with sacctmgr by running 'scontrol show assoc flags=qos'.
As an example I have the following limits defined with sacctmgr:
$ sacctmgr show qos limited format=name,maxtres,grptres%20
Name MaxTRES GrpTRES
---------- ------------- --------------------
limited gres/gpu=4 gres/gpu=10,node=6
Here is what scontrol sees for these limits:
$ scontrol show assoc flags=qos qos=limited
Current Association Manager state
QOS Records
QOS=limited(60)
UsageRaw=0.000000
GrpJobs=N(0) GrpJobsAccrue=N(0) GrpSubmitJobs=N(0) GrpWall=N(0.00)
GrpTRES=cpu=N(0),mem=N(0),energy=N(0),node=6(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0),gres/asdf=N(0),gres/gpu=10(0),gres/test=N(0),license/local=N(0),license/testlic=N(0)
GrpTRESMins=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0),gres/asdf=N(0),gres/gpu=N(0),gres/test=N(0),license/local=N(0),license/testlic=N(0)
GrpTRESRunMins=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0),gres/asdf=N(0),gres/gpu=N(0),gres/test=N(0),license/local=N(0),license/testlic=N(0)
MaxWallPJ=
MaxTRESPJ=gres/gpu=4
MaxTRESPN=
MaxTRESMinsPJ=
MinPrioThresh=
MinTRESPJ=
PreemptMode=OFF
Priority=50
Account Limits
No Accounts
User Limits
No Users
You can see that the GrpTRES shows the limits of 6 nodes and 10 GPUs.
GrpTRES=...,node=6(0)...,gres/gpu=10(0)
The MaxTRES limit also shows up in the output.
MaxTRESPJ=gres/gpu=4
If you aren't seeing these limits enforced can I have you send the output of 'scontrol show assoc flags=qos' along with the 'squeue' output and 'scontrol show job <jobid>' output for one of the jobs that should have this limit enforced but is still able to run.
Thanks,
Ben
[root@head ~]# sacctmgr show qos gpu_limits format=name,maxtres,grptres%20
Name MaxTRES GrpTRES
---------- ------------- --------------------
gpu_limits gres/gpu=24
[root@head01 ~]# scontrol show assoc flags=qos qos=gpu_limits
Current Association Manager state
QOS Records
QOS=gpu_limits(14)
UsageRaw=0.000000
GrpJobs=N(0) GrpJobsAccrue=N(0) GrpSubmitJobs=N(0) GrpWall=N(0.00)
GrpTRES=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0),gres/gpu=N(0),gres/gpu:tesla_v100=N(0)
GrpTRESRunMins=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0),gres/gpu=N(0),gres/gpu:tesla_v100=N(0)
MaxWallPJ=
MaxTRESPJ=gres/gpu=24
MaxTRESPN=
MaxTRESMinsPJ=
MinPrioThresh=
MinTRESPJ=
PreemptMode=OFF
Priority=0
Account Limits
No Accounts
User Limits
No Users
Looks like GrpTRES doesn't have any gpu's configured which is odd.
Looking at the partition also
[root@head01 ~]# scontrol show partition hgx-1
PartitionName=hgx-1
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=gpu_limits
Does AccountStorageEnforce need to be set
Do we need to set AccountingStorageEnforce to qos to enforce these settings? If we do I believe we also need to configure a the new qos as default right? Hi Andrew, My apologies for the delayed response, I was out of the office last week. You do need to have AccountingStorageEnforce configured with 'qos' specified for this to be enforced. My apologies for not verifying that you had this configured previously. You don't need to have this QOS configured as default because it looks like you have it associated with the hgx-1 partition. This means that any job that goes to that partition will have the limits defined in the gpu_limits QOS applied without the user specifying that QOS manually for the job. Let me know if you still have problems getting this enforced with AccountingStorageEnforce set. Thanks, Ben Thanks Ben. I believe we have narrowed down our issue to being a lack of associations in the slurmdb. We have tested on another system and the QOS works only for the root user which is in the slurmdb. If that is the case, the system owner is not interested in also maintaining their slurmdb list of users right now so we need to find another way to get around it if possible. Hi Andrew, I'm afraid there isn't a way to enforce limits in the way you're asking without having user associations created. I understand that it can be a lot of extra work to maintain another list of users. Since we support the ability to put users in multiple accounts, which can get complicated in a hurry, we don't have an automated option for creating users from AD. One option you might be able to implement would be to create a script that monitors new users and adds them to a default account in Slurm. If you don't care about splitting up the users in different accounts to track different types of usage then that might work for you. Or you could use the script to handle most cases and manually put users in different accounts in special cases. I'm afraid that automating something like that is outside the scope of our support, but I wanted to bring it up as a possibility. Thanks, Ben Thanks Ben. Appreciate the help on this. We'll be going the script route to automate the process. |
What are the QOS sacctmgr commands to set the below for all users. I see these to set the 24 gpu per user running simultaneously and max job size: sacctmgr modify QOS default set MaxTRESPerUser=gres/gpu=24 sacctmgr modify QOS default set MaxTRESPerJob=gres/gpu=24 I am not sure how to restrict 48 submitted in the partition See below... (iii) Restrict user with 24 GPUs using simultaneously (Resource use per user ) (iv) Restrict user with 48 GPUs job submission in queue (Resource request submit per user) Configuring limits (max job count, max job size, etc.) ○ Per Job limits (e.g. MaxNodes) ○ Aggregate limits by user, account or QOS (e.g. GrpJobs) (v) Configure max job size = 24 GPUs in Slurm acctmgr: list tres Type Name ID -------- --------------- ------ cpu 1 mem 2 energy 3 node 4 billing 5 fs disk 6 vmem 7 pages 8 gres gpu 1001 gres gpu:tesla_v100 1002 sacctmgr: list qos Name Priority GraceTime Preempt PreemptExemptTime PreemptMode Flags UsageThres UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall MaxTRES MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MaxTRESPA MaxJobsPA MaxSubmitPA MinTRES ---------- ---------- ---------- ---------- ------------------- ----------- ---------------------------------------- ---------- ----------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- --------- ----------- ------------- normal 0 00:00:00 cluster 1.000000 3 3 default 0 00:00:00 cluster 1.000000 3 special 100 00:00:00 cluster 1.000000 sacctmgr: sacctmgr: list user User Def Acct Admin ---------- ---------- --------- pragnesh default None root root Administ+ trickey default None z003tjsa trinidad None sacctmgr: list account Account Descr Org ---------- -------------------- -------------------- default default default needle needle needle root default root account root trinidad trinidad default sacctmgr: