Created attachment 9618 [details] Change xcgroup.c to use notify_on_release rather than release_agent as successful cgroup creation flag. Hi, If I'm reading the sources correctly, src/slurmd/common/xcgroup.c uses the presence of the "release_agent" file in a cgroup to determine whether that cgroup has been created successfully. It looks like that's the only use that's made of that file. For various reasons, I would like to be able to run slurmd inside an LXC container and to use cgroup functionality to manage access to a pair of GPUs. However, LXC hides the release_agent file, which means that slurmd won't start in this case. If I point the test at the "notify_on_release" file (which *is* present inside LXC), everything works. If my guess is right (that release_agent is only used as a test of successful cgroup creation) then the change above would be sufficient to allow slurmd to run inside LXC. I'm attaching a "git diff" against the current version of Slurm. Hope it's appropriate and that you'll accept it.
I wonder whether bug https://bugs.schedmd.com/show_bug.cgi?id=5626 is related. The reporter doesn't say that it's running inside a container, but the symptoms are very similar.