Summary: | How to disable slurmd memory cgroup use | ||
---|---|---|---|
Product: | Slurm | Reporter: | GSK-ONYX-SLURM <slurm-support> |
Component: | slurmd | Assignee: | Marshall Garey <marshall> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | 3 - Medium Impact | ||
Priority: | --- | ||
Version: | 17.11.7 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | GSK | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | ? | Target Release: | --- |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
GSK-ONYX-SLURM
2018-07-30 05:48:49 MDT
See bug 5082. All you have to do is set ConstrainKmemSpace=No in cgroup.conf, and you won't hit the issue anymore. But to recover the memory you have to restart the node. So to confirm, you are saying that I do not need to change SelectTypeParameters from CR_CPU_memory to CR_CPU or set ConstrainRAMSpace=no I just need to set ConstrainKmemSpace=No Is that correct? Thanks. Mark. Yes, that's correct. ConstrainKmemSpace=No is the only change - don't make any of the other changes that you made. But you have to restart the node in order to reclaim the memory that was leaked. Can you confirm that solves the problems you're experiencing? Ok, I will unwind the other changes, put the Kmem fix in place and get the server rebooted. I'll come back to you in the next day or so once we've had chance to test this. Thanks. Mark. Sounds good. Just for your information, ConstrainKmemSpace=No by default in 18.08. We made that change because of this very bug that you are experiencing (and several others have already hit). Have you had any more problems? Closing as resolved/duplicate of 5082. *** This ticket has been marked as a duplicate of ticket 5082 *** |