8197 – DenyOnLimit not working

Ticket 8197 - DenyOnLimit not working

Summary: DenyOnLimit not working

Status:	OPEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Limits (show other tickets)
Version:	19.05.4
Hardware:	Linux Linux

Severity:	6 - No support contract
Assignee:	Jess
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2019-12-08 05:32 MST by Ilya Draigor
Modified:	2019-12-09 10:09 MST (History)
CC List:	0 users

See Also:
Site:	-Other-
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Ilya Draigor 2019-12-08 05:32:20 MST

Hi,

Is this a bug or i'm doining something wrong?

i need to limit QOS for number of gpu uses, for this purpose i set MaxTRESPerAccount=gres/gpu=1 (also tried with GrpTRES) and Flags=DenyOnLimit

when i run sbatch for first job (with --gres=gpu:1) all fine and job is running, but when i run a second job it's marked in queue as (MaxGRESPerAccount) but not rejecting on submission, so DenyOnLimit not working

checked on slurm 19.05.3-2 and 19.05.4-1

My second question, what is the best solution to guarantee gpu cards to users who bought them ?
Let's say, user bought 2 gpu cards for cluster, and i need to guarantee that this user always got immediately access to this cards even if another job must be preempted
as i see, the solution for this is to give user a qos with preemption rights and limit maximum usage for 2 gpus, otherwise he can preempt all running jobs on whole cluster.

Hope for help with this issue

Thanks.

Regards,
Ilya

Comment 1 Jacob Jenson 2019-12-09 09:55:06 MST

Ilya,

In reference to both questions, the functionality you want can be accomplished with Slurm. You need to make configuration changes and changes in how jobs are submitted. 

SchedMD has a commercial support team that can tell you the specific changes you will need to make. However, before you can engage with the support team your site will need to purchase a support contract. Can you please tell me who we should talk with at your site regarding purchasing a Slurm support contract? 

Thank you,
Jacob

Comment 2 Ilya Draigor 2019-12-09 10:09:09 MST

Hi Jacob,
Thanks for reply
I was installed and configured slurm, so you can talk with me

Thanks