Summary: | jobacct_gather/cgroup scales usage by tasks | ||
---|---|---|---|
Product: | Slurm | Reporter: | Martins Innus <minnus> |
Component: | Accounting | Assignee: | Dominik Bartkiewicz <bart> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | CC: | da |
Version: | 17.02.3 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | University of Buffalo (SUNY) | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | Google sites: | --- |
HPCnow Sites: | --- | HPE Sites: | --- |
IBM Sites: | --- | NOAA SIte: | --- |
NoveTech Sites: | --- | Nvidia HWinf-CS Sites: | --- |
OCF Sites: | --- | Recursion Pharma Sites: | --- |
SFW Sites: | --- | SNIC sites: | --- |
Tzag Elita Sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
Martins Innus
2017-06-14 10:38:16 MDT
And I should mention that I have applied attachment 4185 [details] from: https://bugs.schedmd.com/show_bug.cgi?id=3531 to get cgroup accounting working at all. Before applying that patch, we saw the same "0" values as reported in that bug report for memory. Yeah, that patch doesn't now seem the right way to fix this. Sorry for the confusion. I'll do some more testing on a stock 17.02 and try to come up with a better bug report. Hi I will try to improve this patch or find other solution for bug 3531. Dominik OK, thanks! I don’t have a complete handle on it yet. But my best guess is a race condition when running all of: JobAcctGatherType = jobacct_gather/cgroup ProctrackType = proctrack/cgroup TaskPlugin = task/cgroup With stock 17.02.03, when running those plugins and multiple tasks on a node, some PIDS get put in the task cgroup and some PIDS get put in step cgroup. I believe that is the root cause. Martins On Jun 15, 2017, at 8:59 AM, bugs@schedmd.com<mailto:bugs@schedmd.com> wrote: Comment # 3<https://bugs.schedmd.com/show_bug.cgi?id=3895#c3> on bug 3895<https://bugs.schedmd.com/show_bug.cgi?id=3895> from Dominik Bartkiewicz<mailto:bart@schedmd.com> Hi I will try to improve this patch or find other solution for bug 3531<x-msg://11/show_bug.cgi?id=3531>. Dominik ________________________________ You are receiving this mail because: * You reported the bug. This was solved with a different patch in 3531. *** This ticket has been marked as a duplicate of ticket 3531 *** Great thanks Danny! On Jul 19, 2017, at 5:35 PM, "bugs@schedmd.com<mailto:bugs@schedmd.com>" <bugs@schedmd.com<mailto:bugs@schedmd.com>> wrote: Danny Auble<mailto:da@schedmd.com> changed bug 3895<https://bugs.schedmd.com/show_bug.cgi?id=3895> What Removed Added Status UNCONFIRMED RESOLVED Resolution --- DUPLICATE Comment # 8<https://bugs.schedmd.com/show_bug.cgi?id=3895#c8> on bug 3895<https://bugs.schedmd.com/show_bug.cgi?id=3895> from Danny Auble<mailto:da@schedmd.com> This was solved with a different patch in 3531. *** This bug has been marked as a duplicate of bug 3531<show_bug.cgi?id=3531> *** ________________________________ You are receiving this mail because: * You reported the bug. |