Ticket 2079 - Possible to deny jobs with failed MaxTRES association limits?
Summary: Possible to deny jobs with failed MaxTRES association limits?
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 15.08.2
Hardware: Cray XC Linux
: 3 - Medium Impact
Assignee: Brian Christiansen
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2015-10-28 15:12 MDT by Doug Jacobsen
Modified: 2015-10-29 11:01 MDT (History)
4 users (show)

See Also:
Site: NERSC
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Doug Jacobsen 2015-10-28 15:12:38 MDT
I'd like to allow only particular users to use the cray burstbuffer.  I know we can explicitly allow users in burst_buffer.conf, but I'd prefer to use Association MaxTRES limits as we can dynamically modify from a script without needing to modify the slurm configuration, reconfigure and re-enable partitions.

Is it possible to deny jobs if a MaxTRES limit on the association is lacking?

If not, I'll look into adding the needed support to job_submit/lua so we can enforce this directly in the job submit filter.

Thanks so much,
Doug
Comment 1 Doug Jacobsen 2015-10-29 09:55:44 MDT
nevermind, it appears that setting DenyOnLimit in the job QOS had the effect of denying association limits.


Thanks,
Doug
Comment 2 Brian Christiansen 2015-10-29 11:01:51 MDT
As you figured out, the DenyOnLimit qos flag will reject a job submission if the job violates a qos or association Max* limit.

brian@compy:~/slurm/15.08/compy$ sacctmgr modify user brian account=test_acct set maxtres=cpu=0
 Modified user associations...
  C = compy      A = test_acct            U = brian    
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y

brian@compy:~/slurm/15.08/compy$ sbatch --account=test_acct --qos=test_qos --wrap="hostname"
Submitted batch job 99112

brian@compy:~/slurm/15.08/compy$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             99112     debug     wrap    brian PD       0:00      1 (AssocMaxCpuPerJobLimit)

brian@compy:~/slurm/15.08/compy$ sacctmgr modify qos test_qos set flags=denyonlimit
 Modified qos...
  test_qos
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y

brian@compy:~/slurm/15.08/compy$ sbatch --account=test_acct --qos=test_qos --wrap="hostname"
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)