Ticket 2008 - Make job submission denied if memory requested > MemSpecLimit
Summary: Make job submission denied if memory requested > MemSpecLimit
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 14.11.9
Hardware: Linux Linux
: 5 - Enhancement
Assignee: David Bigagli
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2015-10-04 21:30 MDT by Akmal Madzlan
Modified: 2015-10-05 21:44 MDT (History)
2 users (show)

See Also:
Site: DownUnder GeoSolutions
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Akmal Madzlan 2015-10-04 21:30:44 MDT
I have a node with MemSpecLimit=8000 and when submitting a job that request 15258, the job get submitted and run on that node

I think it doesnt really make sense if we allow job that request more memory than MemSpecLimit if we're going to kill them earlier.

I know MaxMemPerNode can be used to achieve this but I would like to set it per node instead of per partition

[akmalm@kud13 ~]$ scontrol show job 3259009
JobId=3259009 JobName=test
   UserId=akmalm(1419) GroupId=prod(2000)
   Priority=100 Nice=0 Account=(null) QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:10 TimeLimit=01:00:00 TimeMin=N/A
   SubmitTime=2015-10-02T09:57:01 EligibleTime=2015-10-02T09:57:01
   StartTime=2015-10-02T09:57:05 EndTime=2015-10-02T10:57:05
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=kud13 AllocNode:Sid=kud13:1843
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=kud13
   BatchHost=kud13
   NumNodes=1 NumCPUs=8 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   Socks/Node=* NtasksPerN:B:S:C=1:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=15258M MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=0 Contiguous=0 Licenses=(null) Network=(null)
   Command=/d/home/akmalm/rj.test.G9oHo8
   WorkDir=/d/home/akmalm
   Comment=/d/home/akmalm/test.job
   StdErr=/d/home/akmalm/logs/test.o3259009
   StdIn=/dev/null
   StdOut=/d/home/akmalm/logs/test.o3259009

[akmalm@kud13 ~]$ scontrol show node kud13
NodeName=kud13 Arch=x86_64 CoresPerSocket=2
   CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.18 Features=localdisk
   Gres=(null)
   NodeAddr=kud13 NodeHostName=kud13 Version=14.11
   OS=Linux RealMemory=15947 AllocMem=0 Sockets=2 Boards=1
   MemSpecLimit=8000
   State=IDLE ThreadsPerCore=2 TmpDisk=674393 Weight=1
   BootTime=2015-09-07T09:56:22 SlurmdStartTime=2015-10-02T09:43:05
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Comment 1 David Bigagli 2015-10-04 23:17:55 MDT
Hi,
   do you really mean MemSpecLimit? That is a limit for compute node daemons
the slurmd and slurmstepd.

David
Comment 2 Akmal Madzlan 2015-10-04 23:28:03 MDT
Yeah, I'm talking about MemSpecLimit.
When MemSpecLimit=8000, job that use more than 8000 will be killed right?

So I dont see the point of allowing the submission of job that request memory more than that
Comment 3 David Bigagli 2015-10-05 00:58:01 MDT
This is the documentation about the parameter. It is not a job limit.

MemSpecLimit
              Limit  on  combined real memory allocation for compute node dae-
              mons (slurmd, slurmstepd), in megabytes.

More informatioon can be found here: http://slurm.schedmd.com/core_spec.html.

David
Comment 4 Akmal Madzlan 2015-10-05 15:38:07 MDT
Right.
So can I make this a feature request?
Comment 5 Moe Jette 2015-10-05 15:59:52 MDT
(In reply to Akmal Madzlan from comment #4)
> Right.
> So can I make this a feature request?

It is already in Slum v15.08. You will need to use the MaxTRESPerJob parameter for Memory. More information here:

http://slurm.schedmd.com/SLUG15/TRES.pdf
http://slurm.schedmd.com/resource_limits.html
http://slurm.schedmd.com/sacctmgr.html
Comment 6 Akmal Madzlan 2015-10-05 21:44:15 MDT
Alright.
Thanks Moe