Ticket 17781

Summary: question about maximum time set for slurm jobs
Product: Slurm Reporter: RAMYA ERANNA <reranna>
Component: ConfigurationAssignee: Director of Support <support>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 22.05.2   
Hardware: Linux   
OS: Linux   
Site: SLAC Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: slurm.conf

Description RAMYA ERANNA 2023-09-26 11:31:45 MDT
Hi Team,

Is it possible to set the max time for slurm jobs based on the accounts ?

Below lcls:default and scavenger accounts should have maximum time of 5 days, while other normal accounts should have 10days

[reranna@sdfmgr002 ~]$ sacctmgr show account | grep lcls
      lcls                 lcls                 lcls 
lcls:amol+       lcls:amolu0017       lcls:amolu0017 
lcls:amox+       lcls:amox33017       lcls:amox33017 
lcls:cxic+       lcls:cxic00121       lcls:cxic00121 
 lcls:data            lcls:data            lcls:data 
lcls:detd+        lcls:detdaq21        lcls:detdaq21 
lcls:diad+        lcls:diadaq13        lcls:diadaq13 
lcls:mecc+       lcls:mecc00121       lcls:mecc00121 
lcls:mecl+     lcls:mecl1002021     lcls:mecl1002021
Comment 2 Caden Ellis 2023-09-27 14:07:01 MDT
This option could be something you are interested in:

https://slurm.schedmd.com/sacctmgr.html#OPT_MaxWallDurationPerJob

You could add this assoc limit on the accounts you want, and set them to what you want.

sacctmgr update account name=sub1 set maxwalldurationperjob+=10

I added a MaxWall time of 10 minutes for my account "sub1"
I can submit a job that is 12 minutes long to my account "sub2" that doesn't have a limit. When I submit a job to "sub1" that is 12 minutes long it submits fine but stays pending showing this:

$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               210    normal     wrap    caden PD       0:00      0 (AssocMaxWallDurationPerJobLimit)

If you want these kinds of jobs to be rejected upfront, you can add the DenyOnLimit QOS flag. 

sacctmgr update qos name=normal set flags+=denyonlimit

My "sub1" account has qos normal, so now the 12 minute job is rejected at submission.

sbatch -Asub1 -t12 --wrap="sleep 10"
sbatch: error: AssocMaxWallDurationPerJobLimit
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

It then does not land in the queue at all.

Does this answer your question?

Caden
Comment 3 RAMYA ERANNA 2023-09-27 15:06:58 MDT
Hi,

Can this value can be higher than system default time?

maxwalldurationperjob

Thanks
Ramya
Comment 4 Caden Ellis 2023-09-27 20:07:49 MDT
These are the order in which resource limits are enforced.

https://slurm.schedmd.com/resource_limits.html#hierarchy

If by system default time you mean the partition default time:
https://slurm.schedmd.com/slurm.conf.html#OPT_DefaultTime

Then yes, it can be higher. That is just the time used if one is not specified. 
You can't go over the partition "MaxTime" parameter with the method I described.

Caden
Comment 5 RAMYA ERANNA 2023-09-28 10:10:32 MDT
Hi, 

We have set partition "MaxTime" parameter in slurm.conf as 10 days. So now If I set the user association limit to 5 days for lcls:default and scavengers, Does it take effect of 5 days? Or will it be still 10 days as described in slurm.conf

Thank you
Ramya
Comment 6 RAMYA ERANNA 2023-09-28 11:56:57 MDT
Created attachment 32476 [details]
slurm.conf
Comment 7 Caden Ellis 2023-10-02 13:17:51 MDT
lcls:default and scavengers will have the limit of 5 days if you set the assoc limit to 5 days, even if the partition has a MaxTime of 10 days. 

Other accounts that don't have a limit of 5 days will be able to go to 10 days on your partitions. 

Caden
Comment 8 Caden Ellis 2023-10-11 11:58:00 MDT
Let me know if you have other questions. Closing

Caden