Ticket 2748

Summary: Add config option to enable/disable kmem constrain. Patch included.
Product: Slurm Reporter: Maciej Pawlik <maciej.pawlik.xml>
Component: ContributionsAssignee: Tim Wickberg <tim>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: Tomasz.Abramczyk
Version: 15.08.8   
Hardware: Linux   
OS: Linux   
Site: -Other- Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed: 17.02-pre0 / 3b5befc9e8565
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: kmem-cg-fix.patch
ConstrainKmemSpace

Description Maciej Pawlik 2016-05-18 23:44:40 MDT
Created attachment 3114 [details]
kmem-cg-fix.patch

This patch adds option 'ConstrainKmemSpace' in cgrup.conf. ConstrainKmemSpace controls if cgroup param: memory.kmem.limit_in_bytes is set during job execution.

Disabling kmem limit is useful when using Lustre file system. Lustre client uses large buffers allocated in kernel memory space, those buffers are accounted to kernel memory used by the cgroup. Unsuspecting user might exceed kernel memory limit simply by reading files. The "plain ram" limit is not affected by this option.

If this patch makes it to the source, please give credit for the patch to:
Lukasz Flis <l.flis@cyfronet.pl> ACC Cyfronet AGH
Jacek Budzowski <j.budzowski@cyfronet.pl> ACC Cyfronet AGH
Comment 1 Tim Wickberg 2016-05-19 02:19:46 MDT
If I'm reading it correctly, the patch would change the default behavior for task/cgroup - if ConstrainKmemSpace is not in the cgroup.conf then it defaults to off, which would be an issue for backwards compatibility.

There may be a better name for the flag - I don't really like the idea of having one of the Constrain options default to on, especially as this option does not operate independently of ConstrainRAMSpace. I'll mull this over a bit more, but if you have any thoughts on this please let me know.

- Tim
Comment 2 Maciej Pawlik 2016-05-27 02:15:24 MDT
Dear Tim,

Yes, this change might be seen as 'braking backwards compatibility', but from my point of view it fixes unexpected behaviour introduced here:
https://bugs.schedmd.com/show_bug.cgi?id=1657
with this commit:
https://github.com/SchedMD/slurm/commit/2475d7b7d4551554e9369ede962176f52d244187
those changes tie kmem limit with user's memory limit, and new functionality (potentially breaking older jobs) to an existing config option.

We need kmem limit tunable, because kmem usage is rarely considered by users, and depends heavily on the system setup. By 'tunable' I mean at least an on/off switch as proposed in the attached patch.
Ideally, from my point of view, would be to have a separate parameter 'kmem limit', which could be set to an exact value or a percent of user's ram usage declaration. This way we would get rid of the implicit and rigid tie between mem and kmem limits.
Comment 3 Tim Wickberg 2016-05-27 02:30:03 MDT
(In reply to Maciej Pawlik from comment #2)
> Dear Tim,
> 
> Yes, this change might be seen as 'braking backwards compatibility', but
> from my point of view it fixes unexpected behaviour introduced here:
> https://bugs.schedmd.com/show_bug.cgi?id=1657
> with this commit:
> https://github.com/SchedMD/slurm/commit/
> 2475d7b7d4551554e9369ede962176f52d244187
> those changes tie kmem limit with user's memory limit, and new functionality
> (potentially breaking older jobs) to an existing config option.

It's been established behavior since 15.08, so we won't change that lightly. I'm weighting the different trade-offs here, and think that a ConstrainKmemSpace that defaults to yes (and requires ConstrainRAMSpace=yes) may be okay.

> We need kmem limit tunable, because kmem usage is rarely considered by
> users, and depends heavily on the system setup. By 'tunable' I mean at least
> an on/off switch as proposed in the attached patch.
> Ideally, from my point of view, would be to have a separate parameter 'kmem
> limit', which could be set to an exact value or a percent of user's ram
> usage declaration. This way we would get rid of the implicit and rigid tie
> between mem and kmem limits.

It would make sense to add that, similar to how AllowedRAMSpace/AllowedSwapSpace work - AllowedKmemSpace=<num>. 

Would you be interesting in working that up as a further patch?
Comment 4 Tomasz.Abramczyk 2016-07-08 05:10:45 MDT
Created attachment 3290 [details]
ConstrainKmemSpace

This patch adds Your sugestions and updates manpage.

Could You write down what do You think about this implementation?

- Tomasz
Comment 7 Tim Wickberg 2016-07-08 13:33:31 MDT
I've fixed a few minor mistakes, and committed this to master (will become 17.02) as commit 3b5befc9e8.

I changed it to default to 'yes' as I'd previously requested; you patch also was missing any documentation which I added in as commit c3a7532f2.

Thanks for working through this, I think the end result nicely compliments the existing plugin.

- Tim
Comment 9 Tomasz.Abramczyk 2016-07-11 02:51:39 MDT
File: src/common/xcgroup_read_config.c
Function: _clear_slurm_cgroup_conf
constrain_kmem_space = false

Is it correct? 

- Tomasz