Ticket 6540

Summary:	Need assistance determining a sane path for XDG_RUNTIME_DIR, pam_systemd, and SLURM
Product:	Slurm	Reporter:	Ryan Novosielski <novosirj>
Component:	Configuration	Assignee:	Felip Moll <felip.moll>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	3 - Medium Impact
Priority:	---
Version:	17.11.7
Hardware:	Linux
OS:	Linux
See Also:	https://bugs.schedmd.com/show_bug.cgi?id=5920
Site:	Rutgers	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	CentOS	Machine Name:	amarel
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Ryan Novosielski 2019-02-19 10:29:39 MST

We are seeing errors from software packages that use XDG_RUNTIME_DIR's value and expect to write to /run/user/109366, for example. From the research I've done, this is normally created by the pam_systemd.so PAM module. However, this module is not a part of /etc/pam.d/slurm on our system, and therefore is not called.

Currently, we use pam_slurm.so in /etc/pam.d/sshd on our compute nodes. We are aware of the existence with pam_slurm_adopt.so and what it does, and theoretically will plan to implement it. I believe both of these modules are used not in /etc/pam.d/slurm, but by other services like sshd.

Ideally, I'd like to provide the XDG-related stuff by adding pam_systemd.so to /etc/pam.d/slurm on the compute nodes. I'm warned that this has some sort of interaction with cgroups, however, which may affect the native cgroup handling in SLURM and pam_slurm_adopt.so.

Can you comment on a sensible way to move forward? Thanks!

Comment 1 Ryan Novosielski 2019-02-19 10:30:33 MST

OS is CentOS 7.5, FYI.

Comment 2 Felip Moll 2019-02-20 03:49:47 MST

(In reply to Ryan Novosielski from comment #1)
> OS is CentOS 7.5, FYI.

Hi Ryan,

We are aware of the situation and are currently discussing it internally
to find a fix for this. There's a proposal to change the design of pam_slurm_adopt.so
and split the module into two, one for pam account and one for session setup.

What happens now is that as you found, pam_systemd.so setups the session doing this:
a) Create  /run/user/$UID and set XDG_RUNTIME_DIR
b) set XDG_SESSION_ID
c) A new systemd scope unit is created for the session
d) remove /run/user/$UID on exit.

on c), it creates a new scope, meaning that the process will be eventually moved into
a new cgroup. This conflicts with pam_slurm_adopt where we adopt the ssh sessions into
the slurm cgroup, because pam_systemd.so runs after pam_slurm_adopt.so.

I am marking now your bug as a duplicate of bug 5920, I encourage you to follow that one
to keep track of the status.

For the moment the workaround is to set/unset the required directories and variables in a
prolog/epilog script.

Please, comment on 5920 if you have further feedback.

*** This ticket has been marked as a duplicate of ticket 5920 ***

Comment 3 Ryan Novosielski 2019-03-21 09:52:41 MDT

Felip,

I have another question. The question in this bug was really about the use of pam_systemd in the "slurm" PAM service, not "ssh" or "system-auth" or other. It didn't immediately occur to me that pam_slurm_adopt conversely would be in the "ssh" PAM service.

My new question is the following: it seems like there is no real potential for negative interaction created by adding "pam_systemd" to the SLURM service, as the conflict is with "pam_slurm_adopt", and that module is used in a different service.

Comment 4 Felip Moll 2019-03-21 10:47:53 MDT

> My new question is the following: it seems like there is no real potential
> for negative interaction created by adding "pam_systemd" to the SLURM
> service, as the conflict is with "pam_slurm_adopt", and that module is used
> in a different service.

That's an interesting question. I need to read through PAM code and do a couple of tests before responding properly, because I am not sure how will systemd affect the current cgroups when the module is called. Also, I'm afraid that systemd will steal processes back if you have both PAM and pam_slurm_adopt enabled.

Let me check everything, I will come back to you.

Comment 5 Ryan Novosielski 2019-03-21 11:22:04 MDT

Thanks, I appreciate it. 

I'm going to run a test to compare /proc/self/cgroup on a node where we have placed pam_systemd into the slurm service and one where we haven't, after 15 mins, to see if there is any difference.

Comment 6 Ryan Novosielski 2019-03-21 11:38:24 MDT

Node with pam_systemd in slurm service:

[novosirj@amarel-test1 ~]$ srun --reservation=pam_systemd -t 30:00 --pty bash -i
srun: job 95830766 queued and waiting for resources
srun: job 95830766 has been allocated resources
[novosirj@slepner060 ~]$ sleep 900; cat /proc/self/cgroup
11:cpuset:/slurm/uid_109366/job_95830766/step_0
10:cpuacct,cpu:/
9:blkio:/
8:freezer:/slurm/uid_109366/job_95830766/step_0
7:pids:/
6:hugetlb:/
5:devices:/
4:memory:/slurm/uid_109366/job_95830766/step_0
3:perf_event:/
2:net_prio,net_cls:/
1:name=systemd:/user.slice/user-109366.slice/session-c20.scope

Node without pam_systemd in slurm service:

[novosirj@amarel-test2 ~]$ srun --pty -t 30:00 bash -i
srun: job 95830770 queued and waiting for resources
srun: job 95830770 has been allocated resources
[novosirj@node009 ~]$ sleep 900; cat /proc/self/cgroup
11:memory:/slurm/uid_109366/job_95830770/step_0
10:freezer:/slurm/uid_109366/job_95830770/step_0
9:cpuset:/slurm/uid_109366/job_95830770/step_0
8:blkio:/
7:net_prio,net_cls:/
6:cpuacct,cpu:/
5:perf_event:/
4:devices:/
3:pids:/
2:hugetlb:/
1:name=systemd:/system.slice/slurmd.service

There does seem to be a difference. Whether this matters is a question.

Comment 7 Ryan Novosielski 2019-03-22 12:10:45 MDT

Just checking to see if there's any update. Thank you!

Comment 8 Felip Moll 2019-03-25 06:02:16 MDT

(In reply to Ryan Novosielski from comment #7)
> Just checking to see if there's any update. Thank you!

The following containers:
11:memory:/slurm/uid_109366/job_95830770/step_0
10:freezer:/slurm/uid_109366/job_95830770/step_0
9:cpuset:/slurm/uid_109366/job_95830770/step_0

Are equal in both cases. I guess there is no problem if you enable this, but take into account that pam_slurm_adopt is affected by the problem, so adding this module may not be possible with your modifications.

Also ensure the slurmd service unit has a Delegate=yes, otherwise cgroups created from slurmd may be modified by systemd e.g. when reloading or restarting a service.

Comment 9 Ryan Novosielski 2019-03-25 06:06:59 MDT

Thank you very much. Implementing pam_slurm_adopt is a slightly longer term goal and this is a good workaround for a more pressing need. We’ll keep it in mind.

Comment 10 Felip Moll 2019-03-25 07:52:33 MDT

(In reply to Ryan Novosielski from comment #9)
> Thank you very much. Implementing pam_slurm_adopt is a slightly longer term
> goal and this is a good workaround for a more pressing need. We’ll keep it
> in mind.

Ok Ryan, if it is fine for you I will close this bug and keep track of the pam_systemd and pam_slurm_adopt in the bug 5920.

Comment 11 Ryan Novosielski 2019-03-25 08:10:17 MDT

That's fine by me; thanks again.

Comment 12 Felip Moll 2019-03-25 08:13:38 MDT

Thanks,

Closing as infogiven.