We have created a partition, fgewf, for our GPU nodes that is limited to a specific QOS (windfall). This QOS has the lowest priority. Here's the configuration of the fgewf partition: 233 PartitionName=fgewf Nodes=h26n[01-16],h27n[01-16],h28n[01-16],h29n[01-16],h30n[01-18],h31n[01-18] \ 234 OverSubscribe=EXCLUSIVE \ # Force jobs to use the entire node 235 MaxTime=1-6:00:00 \ 236 DefMemPerCpu=12500 \ 237 AllowQos=windfall,admin The same QOS is permitted throughout our configured partitions, and its MaxWall is set to 8:00:00. However, for the fgewf partition, we permit runtimes as long as 30 hours. The problem is when a user submits a job to this partition, it gets rejected with: sbatch: error: QOSMaxWallDurationPerJobLimit For this partition only, we would like the QOS MaxWall to be ignored, but instead enforce the partition's MaxTime. What are my options for achieving this goal? Thanks, Tony.
Tony, Looking at how best to implement your request. --Nate
(In reply to Anthony DelSorbo from comment #0) > What are my options for achieving this goal? I believe the cleanest solution is to use a Partition Qos per https://slurm.schedmd.com/qos.html: > The Partition QOS will override the job's QOS. If the opposite is desired you need to have the job's QOS have the 'OverPartQOS' flag which will reverse the order of precedence. Here is a simple example: 1. Create the partition QOS > $ sacctmgr create qos fgewf-windfall set MaxWall=1-6:00:00 2. Assign the partition QOS in slurm.conf: > 233 PartitionName=fgewf Nodes=h26n[01-16],h27n[01-16],h28n[01-16],h29n[01-16],h30n[01-18],h31n[01-18] \ > 234 OverSubscribe=EXCLUSIVE \ # Force jobs to use the entire node > 235 MaxTime=1-6:00:00 \ > 236 DefMemPerCpu=12500 \ > 237 AllowQos=windfall,admin Qos=fgewf-windfall 3. Restart slurmctld Please give a try. Thanks, --Nate
(In reply to Nate Rini from comment #4) > (In reply to Anthony DelSorbo from comment #0) > > What are my options for achieving this goal? > > I believe the cleanest solution is to use a Partition Qos per Nate, Sorry about the missing information: we had that solution and moved away from it. The customer wanted to reduce the total number of QOSs available to the user. The point was "we already have a qos of windfall, why do we need fgewindfall qos" Tony.
Tony, Please provide the output of the following and a copy of your slurm.conf: > sacctmgr show qos -p > scontrol show part Thanks, --Nate
Thanks Nate. See below. [root@bqs1 ~]# sacctmgr show qos -p Name|Priority|GraceTime|Preempt|PreemptExemptTime|PreemptMode|Flags|UsageThres|UsageFactor|GrpTRES|GrpTRESMins|GrpTRESRunMins|GrpJobs|GrpSubmit|GrpWall|MaxTRES|MaxTRESPerNode|MaxTRESMins|MaxWall|MaxTRESPU|MaxJobsPU|MaxSubmitPU|MaxTRESPA|MaxJobsPA|MaxSubmitPA|MinTRES| batch|20|00:00:00|||cluster|DenyOnLimit||1.000000|||||||||||||||||| windfall|1|00:00:00|||cluster|DenyOnLimit||0.000000|||||||||||||||||| debug|30|00:00:00|||cluster|DenyOnLimit||1.000000|||||||cpu=4104|||00:30:00|||||||| urgent|40|00:00:00|||cluster|DenyOnLimit||1.000000|||||||cpu=4104|||08:00:00|||||||| novel|50|00:00:00|||cluster|DenyOnLimit||1.000000||||||||||08:00:00|||||||cpu=4105| admin|90|00:00:00|||cluster|DenyOnLimit||1.000000|||||||cpu=4104|||1-00:00:00|||||||| maximum-qos-normalization|100|00:00:00|||cluster|DenyOnLimit||1.000000|||||||||||||0|||0|| [root@bqs1 ~]# scontrol show part PartitionName=fge AllowGroups=ALL AllowAccounts=nesccmgmt,rda-aidata,rda-esrl-ai,rda-ghpcs,rda-gpucm,rda-isp1,rda-nmfs,rda-rdo1,sena AllowQos=batch,debug,admin AllocNodes=ALL Default=NO QoS=N/A DefaultTime=00:05:00 DisableRootJobs=YES ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=1-06:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=h26n[01-16],h27n[01-16],h28n[01-16],h29n[01-16],h30n[01-18],h31n[01-18] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=EXCLUSIVE OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=2000 TotalNodes=100 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=12500 MaxMemPerNode=UNLIMITED TRESBillingWeights=cpu=1.0 PartitionName=fgewf AllowGroups=ALL AllowAccounts=ALL AllowQos=windfall,admin AllocNodes=ALL Default=NO QoS=N/A DefaultTime=00:05:00 DisableRootJobs=YES ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=1-06:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=h26n[01-16],h27n[01-16],h28n[01-16],h29n[01-16],h30n[01-18],h31n[01-18] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=EXCLUSIVE OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=2000 TotalNodes=100 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=12500 MaxMemPerNode=UNLIMITED TRESBillingWeights=cpu=1.0 PartitionName=hera AllowGroups=ALL AllowAccounts=ALL AllowQos=windfall,batch,debug,novel,urgent,admin AllocNodes=ALL Default=YES QoS=N/A DefaultTime=00:05:00 DisableRootJobs=YES ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=08:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=h1c[01-52],h[2-5]c[01-56],h6c[01-57],h8c[01-56],h9c[01-54],h10c[01-57],h[11-12]c[01-56],h[13,14]c[01-52],h[15-24]c[01-56],h[25]c[01-52] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=53120 TotalNodes=1328 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=2300 MaxMemPerNode=UNLIMITED TRESBillingWeights=cpu=1.0 PartitionName=service AllowGroups=ALL AllowAccounts=ALL AllowQos=windfall,batch,debug,urgent,admin AllocNodes=ALL Default=NO QoS=N/A DefaultTime=00:05:00 DisableRootJobs=YES ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=1 MaxTime=1-00:00:00 MinNodes=0 LLN=YES MaxCPUsPerNode=32 Nodes=hfe[01-12] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=FORCE:1 OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=480 TotalNodes=12 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=2300 MaxMemPerNode=UNLIMITED TRESBillingWeights=cpu=1.0 PartitionName=bigmem AllowGroups=ALL AllowAccounts=ALL AllowQos=windfall,batch,debug,urgent,admin AllocNodes=ALL Default=NO QoS=N/A DefaultTime=00:05:00 DisableRootJobs=YES ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=08:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=h1m[01-04],h13m[01-04] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=320 TotalNodes=8 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=9600 MaxMemPerNode=UNLIMITED TRESBillingWeights=cpu=1 PartitionName=admin AllowGroups=ALL AllowAccounts=nesccmgmt AllowQos=windfall,batch,debug,urgent,admin,novel AllocNodes=ALL Default=NO QoS=N/A DefaultTime=00:05:00 DisableRootJobs=YES ExclusiveUser=NO GraceTime=0 Hidden=YES MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=h1c[01-52],h1m[01-04],h2c[01-56],h3c[01-56],h4c[01-56],h5c[01-56],h6c[01-57],h8c[01-56],h9c[01-54],h10c[01-57],h11c[01-56],h12c[01-56],h13c[01-52],h13m[01-04],h14c[01-52],h15c[01-56],h16c[01-56],h17c[01-56],h18c[01-56],h19c[01-56],h20c[01-56],h21c[01-56],h22c[01-56],h23c[01-56],h24c[01-56],h25c[01-52],h26n[01-16],h27n[01-16],h28n[01-16],h29n[01-16],h30n[01-18],h31n[01-18],hfe[01-12] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=55920 TotalNodes=1448 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=2300 MaxMemPerNode=UNLIMITED TRESBillingWeights=cpu=1.0
Tony, The debug, urgent, novel, and admin QOS set MaxWall and there are multiple partitions that allow multiple QOS for users. This removes the easier solution of just appling wall limit by partitions. I believe the next best solution is to use a job_submit plugin to enforce the WallClock limits explicitly per your site's rules. This will avoid the need to create permutations of the QOS per partition pairs. How does that sound? Thanks, --Nate
Tony, There hasn't been a response in over a week. Please reply to reopen this ticket. Thanks, --Nate