2008 – Make job submission denied if memory requested > MemSpecLimit

Ticket 2008 - Make job submission denied if memory requested > MemSpecLimit

Summary: Make job submission denied if memory requested > MemSpecLimit

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	slurmctld (show other tickets)
Version:	14.11.9
Hardware:	Linux Linux

Severity:	5 - Enhancement
Assignee:	David Bigagli
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2015-10-04 21:30 MDT by Akmal Madzlan
Modified:	2015-10-05 21:44 MDT (History)
CC List:	2 users (show)

See Also:
Site:	DownUnder GeoSolutions
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Akmal Madzlan 2015-10-04 21:30:44 MDT

I have a node with MemSpecLimit=8000 and when submitting a job that request 15258, the job get submitted and run on that node

I think it doesnt really make sense if we allow job that request more memory than MemSpecLimit if we're going to kill them earlier.

I know MaxMemPerNode can be used to achieve this but I would like to set it per node instead of per partition

[akmalm@kud13 ~]$ scontrol show job 3259009
JobId=3259009 JobName=test
   UserId=akmalm(1419) GroupId=prod(2000)
   Priority=100 Nice=0 Account=(null) QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:10 TimeLimit=01:00:00 TimeMin=N/A
   SubmitTime=2015-10-02T09:57:01 EligibleTime=2015-10-02T09:57:01
   StartTime=2015-10-02T09:57:05 EndTime=2015-10-02T10:57:05
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=kud13 AllocNode:Sid=kud13:1843
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=kud13
   BatchHost=kud13
   NumNodes=1 NumCPUs=8 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   Socks/Node=* NtasksPerN:B:S:C=1:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=15258M MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=0 Contiguous=0 Licenses=(null) Network=(null)
   Command=/d/home/akmalm/rj.test.G9oHo8
   WorkDir=/d/home/akmalm
   Comment=/d/home/akmalm/test.job
   StdErr=/d/home/akmalm/logs/test.o3259009
   StdIn=/dev/null
   StdOut=/d/home/akmalm/logs/test.o3259009

[akmalm@kud13 ~]$ scontrol show node kud13
NodeName=kud13 Arch=x86_64 CoresPerSocket=2
   CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.18 Features=localdisk
   Gres=(null)
   NodeAddr=kud13 NodeHostName=kud13 Version=14.11
   OS=Linux RealMemory=15947 AllocMem=0 Sockets=2 Boards=1
   MemSpecLimit=8000
   State=IDLE ThreadsPerCore=2 TmpDisk=674393 Weight=1
   BootTime=2015-09-07T09:56:22 SlurmdStartTime=2015-10-02T09:43:05
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

Comment 1 David Bigagli 2015-10-04 23:17:55 MDT

Hi,
   do you really mean MemSpecLimit? That is a limit for compute node daemons
the slurmd and slurmstepd.

David

Comment 2 Akmal Madzlan 2015-10-04 23:28:03 MDT

Yeah, I'm talking about MemSpecLimit.
When MemSpecLimit=8000, job that use more than 8000 will be killed right?

So I dont see the point of allowing the submission of job that request memory more than that

Comment 3 David Bigagli 2015-10-05 00:58:01 MDT

This is the documentation about the parameter. It is not a job limit.

MemSpecLimit
              Limit  on  combined real memory allocation for compute node dae-
              mons (slurmd, slurmstepd), in megabytes.

More informatioon can be found here: http://slurm.schedmd.com/core_spec.html.

David

Comment 4 Akmal Madzlan 2015-10-05 15:38:07 MDT

Right.
So can I make this a feature request?

Comment 5 Moe Jette 2015-10-05 15:59:52 MDT

(In reply to Akmal Madzlan from comment #4)
> Right.
> So can I make this a feature request?

It is already in Slum v15.08. You will need to use the MaxTRESPerJob parameter for Memory. More information here:

http://slurm.schedmd.com/SLUG15/TRES.pdf
http://slurm.schedmd.com/resource_limits.html
http://slurm.schedmd.com/sacctmgr.html

Comment 6 Akmal Madzlan 2015-10-05 21:44:15 MDT

Alright.
Thanks Moe