| Summary: | cgroup v2 swap limit problem | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Lloyd Brown <lloyd_brown> |
| Component: | slurmd | Assignee: | Felip Moll <felip.moll> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 22.05.7 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=17849 | ||
| Site: | BYU - Brigham Young University | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 23.02.6 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | cgroup v2 fix to swap limits | ||
|
Description
Lloyd Brown
2023-07-19 10:02:42 MDT
Hi Lloyd, you are right. We did know about this but skipped this detail. The behavior of cgroup/v1 is indeed different from cgroup/v2. -- In cgroup/v1 there's one limit for RSS, and one limit for RSS+SWAP. In cgroup/v2 there's one limit for RSS, and one limit for SWAP. In cgroup/v1 there's Swappiness. In cgroup/v2 swappiness does not exist. In cgroup/v1 AllowedSwapSpace=0 means that RAM+Swap will be limited to AllowedRAMSpace In cgroup/v2 AllowedSwapSpace=0 means that the job cannot use swap. We should: a) Document these differences and how all it works in cgroup.conf and possibly in the cgroups web page. b) Apply a solution like your patch. But what I would really like is to modify the task/cgroup memory plugin to not do the calculations assuming the plugins will have memory+swap interfaces, but just calculate the Swap and the Memory separatelly. Then I'd like the specific plugins, like cgroup/v1, to use these values as needed. In that case, cgroup/v2 wouldn't need to do any modification. Your patch is correct, but modifies cgroup/v2 and leaves particularities of the specific logic of v1 in the task/cgroup memory. The functions to modify in b) are: task_cgroup_memory.c: swap_limit_in_bytes() and _memcg_initialize() cgroup_v1.c: cgroup_p_constrain_set() Does my reasoning make sense to you? I can work on this patch, but if you feel comfortable and want to dedicate time I can wait for your contribution. Felip, TBH, it sounds like you have a better understanding of the differences than I do. I'm content using my simple patch for the time being, if you want to take the time to do a better, more thorough/correct job. I'm afraid that once the immediate problem is handled, I have enough other tasks that have to take higher priority, that I'll probably leave the details up to you. Lloyd (In reply to Lloyd Brown from comment #3) > Felip, > > TBH, it sounds like you have a better understanding of the differences than > I do. I'm content using my simple patch for the time being, if you want to > take the time to do a better, more thorough/correct job. I'm afraid that > once the immediate problem is handled, I have enough other tasks that have > to take higher priority, that I'll probably leave the details up to you. > > Lloyd Great, that's not a problem. Your patch is functionally correct so you can use it if in the meantime. I will add this to my queue and work on it asap. Thanks Lloyd. Hi Lloyd, I included the idea of your fix into the next 23.02.6 release, I think it must be fixed in this release and then in master do bigger changes if needed, I will open an internal bug for this matter. I put your name on the commit. Thanks for reporting. commit ec54427c0b12e4a21309426b2c877d38efad28a6 Author: Lloyd Brown <lloyd_brown@byu.edu> AuthorDate: Fri Oct 6 14:53:14 2023 +0200 Commit: Felip Moll <felip.moll@schedmd.com> CommitDate: Fri Oct 6 14:57:19 2023 +0200 Fix incorrect memory.swap.max setting in cgroup/v2 In cgroup/v2 there are two independent limits, memory.max and memory.swap.max. In cgroup/v1 the swap limit was memory+swap, so it needed to be set to memory.limit_in_bytes + a percentage, so AllowedSwapSpace=5 implied setting memory.memsw_limit_in_bytes to 105% of memory.limit_in_bytes. In cgroup/v2 we need to separate these values, and just make memory.swap.max to be a 5% of memory.max, not 105%. Bug 17233 |