Ticket 2079

Summary: Possible to deny jobs with failed MaxTRES association limits?
Product: Slurm Reporter: Doug Jacobsen <dmjacobsen>
Component: ConfigurationAssignee: Brian Christiansen <brian>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 3 - Medium Impact    
Priority: --- CC: alex, brian, da, tim
Version: 15.08.2   
Hardware: Cray XC   
OS: Linux   
Site: NERSC Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Doug Jacobsen 2015-10-28 15:12:38 MDT
I'd like to allow only particular users to use the cray burstbuffer.  I know we can explicitly allow users in burst_buffer.conf, but I'd prefer to use Association MaxTRES limits as we can dynamically modify from a script without needing to modify the slurm configuration, reconfigure and re-enable partitions.

Is it possible to deny jobs if a MaxTRES limit on the association is lacking?

If not, I'll look into adding the needed support to job_submit/lua so we can enforce this directly in the job submit filter.

Thanks so much,
Doug
Comment 1 Doug Jacobsen 2015-10-29 09:55:44 MDT
nevermind, it appears that setting DenyOnLimit in the job QOS had the effect of denying association limits.


Thanks,
Doug
Comment 2 Brian Christiansen 2015-10-29 11:01:51 MDT
As you figured out, the DenyOnLimit qos flag will reject a job submission if the job violates a qos or association Max* limit.

brian@compy:~/slurm/15.08/compy$ sacctmgr modify user brian account=test_acct set maxtres=cpu=0
 Modified user associations...
  C = compy      A = test_acct            U = brian    
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y

brian@compy:~/slurm/15.08/compy$ sbatch --account=test_acct --qos=test_qos --wrap="hostname"
Submitted batch job 99112

brian@compy:~/slurm/15.08/compy$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             99112     debug     wrap    brian PD       0:00      1 (AssocMaxCpuPerJobLimit)

brian@compy:~/slurm/15.08/compy$ sacctmgr modify qos test_qos set flags=denyonlimit
 Modified qos...
  test_qos
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y

brian@compy:~/slurm/15.08/compy$ sbatch --account=test_acct --qos=test_qos --wrap="hostname"
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)