Summary: | Need assistance determining a sane path for XDG_RUNTIME_DIR, pam_systemd, and SLURM | ||
---|---|---|---|
Product: | Slurm | Reporter: | Ryan Novosielski <novosirj> |
Component: | Configuration | Assignee: | Felip Moll <felip.moll> |
Status: | RESOLVED INFOGIVEN | QA Contact: | |
Severity: | 3 - Medium Impact | ||
Priority: | --- | ||
Version: | 17.11.7 | ||
Hardware: | Linux | ||
OS: | Linux | ||
See Also: | https://bugs.schedmd.com/show_bug.cgi?id=5920 | ||
Site: | Rutgers | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Tzag Elita Sites: | --- |
Linux Distro: | CentOS | Machine Name: | amarel |
CLE Version: | Version Fixed: | ||
Target Release: | --- | DevPrio: | --- |
Emory-Cloud Sites: | --- |
Description
Ryan Novosielski
2019-02-19 10:29:39 MST
OS is CentOS 7.5, FYI. (In reply to Ryan Novosielski from comment #1) > OS is CentOS 7.5, FYI. Hi Ryan, We are aware of the situation and are currently discussing it internally to find a fix for this. There's a proposal to change the design of pam_slurm_adopt.so and split the module into two, one for pam account and one for session setup. What happens now is that as you found, pam_systemd.so setups the session doing this: a) Create /run/user/$UID and set XDG_RUNTIME_DIR b) set XDG_SESSION_ID c) A new systemd scope unit is created for the session d) remove /run/user/$UID on exit. on c), it creates a new scope, meaning that the process will be eventually moved into a new cgroup. This conflicts with pam_slurm_adopt where we adopt the ssh sessions into the slurm cgroup, because pam_systemd.so runs after pam_slurm_adopt.so. I am marking now your bug as a duplicate of bug 5920, I encourage you to follow that one to keep track of the status. For the moment the workaround is to set/unset the required directories and variables in a prolog/epilog script. Please, comment on 5920 if you have further feedback. *** This ticket has been marked as a duplicate of ticket 5920 *** Felip, I have another question. The question in this bug was really about the use of pam_systemd in the "slurm" PAM service, not "ssh" or "system-auth" or other. It didn't immediately occur to me that pam_slurm_adopt conversely would be in the "ssh" PAM service. My new question is the following: it seems like there is no real potential for negative interaction created by adding "pam_systemd" to the SLURM service, as the conflict is with "pam_slurm_adopt", and that module is used in a different service. > My new question is the following: it seems like there is no real potential
> for negative interaction created by adding "pam_systemd" to the SLURM
> service, as the conflict is with "pam_slurm_adopt", and that module is used
> in a different service.
That's an interesting question. I need to read through PAM code and do a couple of tests before responding properly, because I am not sure how will systemd affect the current cgroups when the module is called. Also, I'm afraid that systemd will steal processes back if you have both PAM and pam_slurm_adopt enabled.
Let me check everything, I will come back to you.
Thanks, I appreciate it. I'm going to run a test to compare /proc/self/cgroup on a node where we have placed pam_systemd into the slurm service and one where we haven't, after 15 mins, to see if there is any difference. Node with pam_systemd in slurm service: [novosirj@amarel-test1 ~]$ srun --reservation=pam_systemd -t 30:00 --pty bash -i srun: job 95830766 queued and waiting for resources srun: job 95830766 has been allocated resources [novosirj@slepner060 ~]$ sleep 900; cat /proc/self/cgroup 11:cpuset:/slurm/uid_109366/job_95830766/step_0 10:cpuacct,cpu:/ 9:blkio:/ 8:freezer:/slurm/uid_109366/job_95830766/step_0 7:pids:/ 6:hugetlb:/ 5:devices:/ 4:memory:/slurm/uid_109366/job_95830766/step_0 3:perf_event:/ 2:net_prio,net_cls:/ 1:name=systemd:/user.slice/user-109366.slice/session-c20.scope Node without pam_systemd in slurm service: [novosirj@amarel-test2 ~]$ srun --pty -t 30:00 bash -i srun: job 95830770 queued and waiting for resources srun: job 95830770 has been allocated resources [novosirj@node009 ~]$ sleep 900; cat /proc/self/cgroup 11:memory:/slurm/uid_109366/job_95830770/step_0 10:freezer:/slurm/uid_109366/job_95830770/step_0 9:cpuset:/slurm/uid_109366/job_95830770/step_0 8:blkio:/ 7:net_prio,net_cls:/ 6:cpuacct,cpu:/ 5:perf_event:/ 4:devices:/ 3:pids:/ 2:hugetlb:/ 1:name=systemd:/system.slice/slurmd.service There does seem to be a difference. Whether this matters is a question. Just checking to see if there's any update. Thank you! (In reply to Ryan Novosielski from comment #7) > Just checking to see if there's any update. Thank you! The following containers: 11:memory:/slurm/uid_109366/job_95830770/step_0 10:freezer:/slurm/uid_109366/job_95830770/step_0 9:cpuset:/slurm/uid_109366/job_95830770/step_0 Are equal in both cases. I guess there is no problem if you enable this, but take into account that pam_slurm_adopt is affected by the problem, so adding this module may not be possible with your modifications. Also ensure the slurmd service unit has a Delegate=yes, otherwise cgroups created from slurmd may be modified by systemd e.g. when reloading or restarting a service. Thank you very much. Implementing pam_slurm_adopt is a slightly longer term goal and this is a good workaround for a more pressing need. We’ll keep it in mind. (In reply to Ryan Novosielski from comment #9) > Thank you very much. Implementing pam_slurm_adopt is a slightly longer term > goal and this is a good workaround for a more pressing need. We’ll keep it > in mind. Ok Ryan, if it is fine for you I will close this bug and keep track of the pam_systemd and pam_slurm_adopt in the bug 5920. That's fine by me; thanks again. Thanks, Closing as infogiven. |