We are researching the error SLUB: Unable to allocate memory on node -1 (gfp=0x8020) We found reference to this error here: https://github.com/docker/docker/issues/27576 which in turn references this error: https://bugs.schedmd.com/show_bug.cgi?id=2846 I am unable to find a bug 2846, nor is there any hit on the terms slub allocate unable ... Could this specific error be forwarded to me ? It might be helpful to us. I appreciate the assistance. Virginia ( Jenny ) Williams UNC Chapel Hill
The bug is marked private; briefly, it has to do with a memory leak in the cgroup subsystems. It was fixed with commit 85ab952adf26 in 16.05.7 and later.
As a follow on, is this error from slurmd logs also related ? slurmd.log-20170320:[2017-03-16T10:24:17.707] _run_prolog: prolog with lock for job 2731931 ran for 0 seconds slurmd.log-20170320:[2017-03-16T10:24:17.707] Launching batch job 2731931 for UID 237264 slurmd.log-20170320:[2017-03-16T10:24:17.757] [2731931] error: task/cgroup: unable to add task[pid=196105] to memory cg '(null)' slurmd.log-20170320:[2017-03-16T10:24:18.059] [2731931] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 0 slurmd.log-20170320:[2017-03-16T10:24:18.060] [2731931] done with job
(In reply to Jenny Williams from comment #2) > As a follow on, is this error from slurmd logs also related ? > > > slurmd.log-20170320:[2017-03-16T10:24:17.707] _run_prolog: prolog with lock > for job 2731931 ran for 0 seconds > slurmd.log-20170320:[2017-03-16T10:24:17.707] Launching batch job 2731931 > for UID 237264 > slurmd.log-20170320:[2017-03-16T10:24:17.757] [2731931] error: task/cgroup: > unable to add task[pid=196105] to memory cg '(null)' > slurmd.log-20170320:[2017-03-16T10:24:18.059] [2731931] sending > REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 0 > slurmd.log-20170320:[2017-03-16T10:24:18.060] [2731931] done with job I don't believe that's directly related; but IIRC that may have been addressed by a separate fix to the cgroups subsystem. I'd expect that to go away with 16.05.10 if you get a chance to upgrade. Is there anything else I can help answer on this?
> I don't believe that's directly related; but IIRC that may have been > addressed by a separate fix to the cgroups subsystem. I'd expect that to go > away with 16.05.10 if you get a chance to upgrade. > > Is there anything else I can help answer on this? Marking resolved/infogiven. If you're still seeing issues after an upgrade please reopen, or file a new bug, and we'll be happy to help. - Tim