Ticket 8986

Summary:	OverMemoryKill enforces only step memory limit, not total usage
Product:	Slurm	Reporter:	CSC sysadmins <csc-slurm-tickets>
Component:	Limits	Assignee:	Nate Rini <nate>
Status:	RESOLVED FIXED	QA Contact:
Severity:	4 - Minor Issue
Priority:	---
Version:	19.05.6
Hardware:	Linux
OS:	Linux
Site:	CSC - IT Center for Science	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:	20.02.4, 20.11
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description CSC sysadmins 2020-05-05 06:03:04 MDT

Nate suggested to open a separate ticket for this case:

https://bugs.schedmd.com/show_bug.cgi?id=8656#c101

"The read values look valid considering memory usage overhead. OverMemoryKill appears to only enforce the step memory limit (instead of the job total) currently, which I believe is worthy of a new ticket."

Comment 1 Nate Rini 2020-05-06 10:53:00 MDT

Tommi

Looking into what Slurm should be doing.

--Nate

Comment 2 CSC sysadmins 2020-05-11 08:53:49 MDT

(In reply to Nate Rini from comment #1)

> Looking into what Slurm should be doing.

Hi,

Only reliable solution what comes to my mind is to combine extern step and running job step pss and verify that it's under the limit? Or do you mean case where --mem-per-cpu is set and one extern step consuming memory also?

Comment 8 Nate Rini 2020-05-29 14:04:39 MDT

(In reply to Tommi Tervo from comment #2)
> (In reply to Nate Rini from comment #1)
> > Looking into what Slurm should be doing.
After consulting internally about how overmemorykill works, we decided that this is a documentation issue. (Updated here: https://github.com/SchedMD/slurm/commit/b82d7c29f4fabea702dba3b08e9581e450c4f064)

Overmemorykill is not suggested due to its inherent limits and instead we suggest using cgroups and 'ConstrainRAMSpace=yes' which will limit the memory on a per job/step basis.

> Only reliable solution what comes to my mind is to combine extern step and
> running job step pss and verify that it's under the limit?

Each step/task (process tree) in a job forks a new slurmstepd instance that would have to communicate with the lead slurmd instance in order to actually implement a limit for the whole job. None of the required RPCs or functionality current exist to implement this with overmemorykill. Extern steps and MPI jobs actual fork secondary tasks instance which also only enforce limits against the single process tree and slurmstepd instances further complicating matters.

> Or do you mean case where --mem-per-cpu is set and one extern step consuming memory also?

Memory limits are set per job and can be set for steps/tasks when using cgroups and 'ConstrainRAMSpace=yes' due to the built in hierarchy of cgroups in the Linux kernel. There is currently no plan to implement this for Overmemorykill as we don't suggest sites use it anymore.

I'm closing this ticket, please reply to this ticket if you have any questions and we can continue from here.

Thanks,
--Nate