| Summary: | Unable to find bug # 2643 | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Jenny Williams <jennyw> |
| Component: | Other | Assignee: | Tim Wickberg <tim> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | adam.huffman |
| Version: | 16.05.3 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | UNC | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
Jenny Williams
2017-03-31 10:44:48 MDT
The bug is marked private; briefly, it has to do with a memory leak in the cgroup subsystems. It was fixed with commit 85ab952adf26 in 16.05.7 and later. As a follow on, is this error from slurmd logs also related ? slurmd.log-20170320:[2017-03-16T10:24:17.707] _run_prolog: prolog with lock for job 2731931 ran for 0 seconds slurmd.log-20170320:[2017-03-16T10:24:17.707] Launching batch job 2731931 for UID 237264 slurmd.log-20170320:[2017-03-16T10:24:17.757] [2731931] error: task/cgroup: unable to add task[pid=196105] to memory cg '(null)' slurmd.log-20170320:[2017-03-16T10:24:18.059] [2731931] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 0 slurmd.log-20170320:[2017-03-16T10:24:18.060] [2731931] done with job (In reply to Jenny Williams from comment #2) > As a follow on, is this error from slurmd logs also related ? > > > slurmd.log-20170320:[2017-03-16T10:24:17.707] _run_prolog: prolog with lock > for job 2731931 ran for 0 seconds > slurmd.log-20170320:[2017-03-16T10:24:17.707] Launching batch job 2731931 > for UID 237264 > slurmd.log-20170320:[2017-03-16T10:24:17.757] [2731931] error: task/cgroup: > unable to add task[pid=196105] to memory cg '(null)' > slurmd.log-20170320:[2017-03-16T10:24:18.059] [2731931] sending > REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 0 > slurmd.log-20170320:[2017-03-16T10:24:18.060] [2731931] done with job I don't believe that's directly related; but IIRC that may have been addressed by a separate fix to the cgroups subsystem. I'd expect that to go away with 16.05.10 if you get a chance to upgrade. Is there anything else I can help answer on this? > I don't believe that's directly related; but IIRC that may have been
> addressed by a separate fix to the cgroups subsystem. I'd expect that to go
> away with 16.05.10 if you get a chance to upgrade.
>
> Is there anything else I can help answer on this?
Marking resolved/infogiven. If you're still seeing issues after an upgrade please reopen, or file a new bug, and we'll be happy to help.
- Tim
|