| Summary: | Add config option to enable/disable kmem constrain. Patch included. | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Maciej Pawlik <maciej.pawlik.xml> |
| Component: | Contributions | Assignee: | Tim Wickberg <tim> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | Tomasz.Abramczyk |
| Version: | 15.08.8 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | -Other- | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 17.02-pre0 / 3b5befc9e8565 | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: |
kmem-cg-fix.patch
ConstrainKmemSpace |
||
|
Description
Maciej Pawlik
2016-05-18 23:44:40 MDT
If I'm reading it correctly, the patch would change the default behavior for task/cgroup - if ConstrainKmemSpace is not in the cgroup.conf then it defaults to off, which would be an issue for backwards compatibility. There may be a better name for the flag - I don't really like the idea of having one of the Constrain options default to on, especially as this option does not operate independently of ConstrainRAMSpace. I'll mull this over a bit more, but if you have any thoughts on this please let me know. - Tim Dear Tim, Yes, this change might be seen as 'braking backwards compatibility', but from my point of view it fixes unexpected behaviour introduced here: https://bugs.schedmd.com/show_bug.cgi?id=1657 with this commit: https://github.com/SchedMD/slurm/commit/2475d7b7d4551554e9369ede962176f52d244187 those changes tie kmem limit with user's memory limit, and new functionality (potentially breaking older jobs) to an existing config option. We need kmem limit tunable, because kmem usage is rarely considered by users, and depends heavily on the system setup. By 'tunable' I mean at least an on/off switch as proposed in the attached patch. Ideally, from my point of view, would be to have a separate parameter 'kmem limit', which could be set to an exact value or a percent of user's ram usage declaration. This way we would get rid of the implicit and rigid tie between mem and kmem limits. (In reply to Maciej Pawlik from comment #2) > Dear Tim, > > Yes, this change might be seen as 'braking backwards compatibility', but > from my point of view it fixes unexpected behaviour introduced here: > https://bugs.schedmd.com/show_bug.cgi?id=1657 > with this commit: > https://github.com/SchedMD/slurm/commit/ > 2475d7b7d4551554e9369ede962176f52d244187 > those changes tie kmem limit with user's memory limit, and new functionality > (potentially breaking older jobs) to an existing config option. It's been established behavior since 15.08, so we won't change that lightly. I'm weighting the different trade-offs here, and think that a ConstrainKmemSpace that defaults to yes (and requires ConstrainRAMSpace=yes) may be okay. > We need kmem limit tunable, because kmem usage is rarely considered by > users, and depends heavily on the system setup. By 'tunable' I mean at least > an on/off switch as proposed in the attached patch. > Ideally, from my point of view, would be to have a separate parameter 'kmem > limit', which could be set to an exact value or a percent of user's ram > usage declaration. This way we would get rid of the implicit and rigid tie > between mem and kmem limits. It would make sense to add that, similar to how AllowedRAMSpace/AllowedSwapSpace work - AllowedKmemSpace=<num>. Would you be interesting in working that up as a further patch? Created attachment 3290 [details]
ConstrainKmemSpace
This patch adds Your sugestions and updates manpage.
Could You write down what do You think about this implementation?
- Tomasz
I've fixed a few minor mistakes, and committed this to master (will become 17.02) as commit 3b5befc9e8. I changed it to default to 'yes' as I'd previously requested; you patch also was missing any documentation which I added in as commit c3a7532f2. Thanks for working through this, I think the end result nicely compliments the existing plugin. - Tim File: src/common/xcgroup_read_config.c Function: _clear_slurm_cgroup_conf constrain_kmem_space = false Is it correct? - Tomasz |