Ticket 5026

Summary:	pam_slurm_adopt not adopting on Ubuntu 16.04
Product:	Slurm	Reporter:	Karl Kornel <akkornel>
Component:	Contributions	Assignee:	Marshall Garey <marshall>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	3 - Medium Impact
Priority:	---	CC:	aseishas, marshall
Version:	17.11.2
Hardware:	Linux
OS:	Linux
See Also:	https://bugs.schedmd.com/show_bug.cgi?id=5920
Site:	Stanford	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---
Attachments:	slurm.conf cgroup.conf /etc/pam.d/sshd /etc/pam.d/common-account /etc/pam.d/common-session /etc/pam.d/common-password /etc/pam.d/sshd slurm.conf /etc/pam.d/sshd /var/log/slurm-llnl/slurmd /var/log/slurm-llnl/slurmd

Description Karl Kornel 2018-04-04 19:44:16 MDT

Hello!

We have run into an interesting issue with pam_slurm_adopt: It is not actually adopting our SSH sessions into our cgroup, and I think it's related to --enable-multiple-slurmd.

We are running Ubuntu 16.04.4.  Our SLURM version is 17.11.2-1stanford1:  That is, 17.11.2-1 from the Debian maintainers, and we modified the build to package up pam_slurm_adopt (it was already being built, just not packaged).

We did run pam_slurm_adopt with the debug flag set to "debug", and got logs like these…

Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf
Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  Ignoring obsolete CgroupReleaseAgentDir option.
Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  Reading slurm.conf file: /etc/slurm-llnl/slurm.conf
Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  Ignoring obsolete SchedulerPort option.
Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  Munge authentication plugin loaded
Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: Connection by user akkornel: user has only one job 223952
Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  _adopt_process: trying to get 223952.4294967295 to adopt 3864
Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  Leaving stepd_add_extern_pid
Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  Leaving stepd_get_x11_display
Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: Process 3864 adopted into job 223952

… but things still did not work.

When I say "still did not work", I mean, if I run `salloc …` to get an allocation, and then make a separate SSH connection to the compute node, then …

• When I SSH in and check /proc/self/cgroup, I see this:

12:freezer:/
11:devices:/system.slice/ssh.service
10:cpu,cpuacct:/system.slice/ssh.service
9:memory:/system.slice/ssh.service
8:hugetlb:/
7:blkio:/system.slice/ssh.service
6:net_cls,net_prio:/
5:rdma:/
4:perf_event:/
3:pids:/system.slice/ssh.service
2:cpuset:/
1:name=systemd:/system.slice/ssh.service


• When I run Python code to see how many CPUs I can actually use, I see that I am not limited.

Also, this only affects pam_slurm_adopt.  I say that because, if I run `srun --pty /bin/bash` from inside the job allocation, the shell I get on the compute node does have the proper cgroups:

12:memory:/slurm_wheat11/uid_271766/job_224111/step_0/task_0
11:hugetlb:/
10:devices:/system.slice/slurmd.service
9:cpuset:/slurm_wheat11/uid_271766/job_224111/step_0
8:rdma:/
7:perf_event:/
6:blkio:/system.slice/slurmd.service
5:cpu,cpuacct:/slurm_wheat11/uid_271766/job_224111/step_0/task_0
4:freezer:/slurm_wheat11/uid_271766/job_224111/step_0
3:pids:/system.slice/slurmd.service
2:net_cls,net_prio:/
1:name=systemd:/system.slice/slurmd.service

The above output is what clued me in to the possible problem.  Debian's packages build SLURM with --enable-multiple-slurmd.  I found that when the pam_slurm_adopt code was first added (in https://bugs.schedmd.com/attachment.cgi?id=2336&action=diff), the original code (in _adopt_process) did hard-code "/slurm/uid_%u…" as the cgroup path.  But, the code has changed so much since then, I'm not able to trace things now.

Just to confirm, we have set PrologFlags=contain in slurm.conf. We also tried with pam_systemd.so removed (as per bug #4098) and that did not have any effect.

Please let us know what other information we can provide!

Comment 1 Tim Wickberg 2018-04-04 22:02:44 MDT

> Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  Reading cgroup.conf
> file /etc/slurm-llnl/cgroup.conf
> Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  Ignoring obsolete
> CgroupReleaseAgentDir option.
> Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  Reading slurm.conf
> file: /etc/slurm-llnl/slurm.conf
> Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  Ignoring obsolete
> SchedulerPort option.
> Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  Munge authentication
> plugin loaded
> Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: Connection by user akkornel:
> user has only one job 223952
> Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  _adopt_process:
> trying to get 223952.4294967295 to adopt 3864
> Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  Leaving
> stepd_add_extern_pid
> Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug:  Leaving
> stepd_get_x11_display
> Apr  4 17:07:51 wheat05 pam_slurm_adopt[3864]: Process 3864 adopted into job
> 223952

Totally unrelated, but can I suggest you clean up your slurm.conf (SchedulerPort) and cgroup.conf (CgroupReleaseAgentDir) a bit and remove those obsolete settings?


> The above output is what clued me in to the possible problem.  Debian's
> packages build SLURM with --enable-multiple-slurmd.  I found that when the
> pam_slurm_adopt code was first added (in
> https://bugs.schedmd.com/attachment.cgi?id=2336&action=diff), the original
> code (in _adopt_process) did hard-code "/slurm/uid_%u…" as the cgroup path. 
> But, the code has changed so much since then, I'm not able to trace things
> now.

That shouldn't be a problem. Your logs indicate it is trying to adopt the process, which implies it did manage to identify the correct slurmd.

I use multiple-slurmd builds on a lot of systems - pam_slurm_adopt works fine this way, as long as it was also built with --enable-multiple-slurmd.

> Just to confirm, we have set PrologFlags=contain in slurm.conf. We also
> tried with pam_systemd.so removed (as per bug #4098) and that did not have
> any effect.

I'm pretty sure that's the issue though... the /system.slice/ references there imply pam_systemd is involved.

Can you walk me through the modifications you've made under /etc/pam.d/, and make sure to comment out any references to pam_systemd?

Comment 2 aseishas 2018-04-05 11:31:06 MDT

The obsolete options have been removed.

We assumed pam_systemd was the issue at first, too, but removing it did not seem to change behavior. The only change to /etc/pam.d/sshd we made was to add a line for pam_slurm_adopt (replacing an existing line for pam_slurm): `account required /lib/security/pam_slurm_adopt.so`. The sshd file does not have a line for pam_systemd but includes common-account, common-session, and common-password; of these, only common-session contains a line for pam_systemd, `session optional pam_systemd.so`, which we commented out.

  % grep systemd /etc/pam.d/sshd
  % grep @ /etc/pam.d/sshd          
  @include common-account
  @include common-session
  @include common-password
  % grep systemd /etc/pam.d/common-*
  /etc/pam.d/common-session:#session	optional	pam_systemd.so
  % grep @ /etc/pam.d/common-*
  % 

I can attach copies of these files, if necessary.

Comment 3 aseishas 2018-04-16 12:15:25 MDT

Any update or advice on this issue, or any further information we can provide? It doesn't look like pam_systemd is in the stack, but perhaps systemd is getting involved elsewhere?

Comment 4 Tim Wickberg 2018-04-16 12:27:38 MDT

Running through this again, I think your original theory about --enable-multiple-slurmd having affected this was correct.

Can you try adding the nodename=$whateverthenodename option on the pam_slurm_adopt line in the configs?

It does look like leaving that off may be preventing it from finding the correct location in the hierarchy.

As an aside - we don't usually recommend using the Debian packages for this (and other) reasons. You may want to look into locally installing them if possible at some point.

The 17.11.2 release is also before the most recent security update for SlurmDBD - if you haven't looked at patching the SlurmDBD at least I would recommend you do so ASAP.

- Tim

Comment 5 aseishas 2018-04-16 18:01:04 MDT

I wasn't able to find documentation on the nodename option; is it maybe node_name, instead? I've tried both without success:

Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug:  Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf
Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug:  Reading slurm.conf file: /etc/slurm-llnl/slurm.conf
Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug3: cpu_freq_verify_def: Performance set
Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug4: found jobid = 243802, stepid = 4294967295
Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug4: found jobid = 243802, stepid = 0
Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so
Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug:  Munge authentication plugin loaded
Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug3: Success.
Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: Connection by user epoch: user has only one job 243802
Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug:  _adopt_process: trying to get 243802.4294967295 to adopt 6455
Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug:  Leaving stepd_add_extern_pid
Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug:  Leaving stepd_get_x11_display
Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: Process 6455 adopted into job 243802

The logs indicate success for the target process... Is there anything that would prevent a child of that process from inheriting the parent's cgroup?

Comment 6 Tim Wickberg 2018-04-16 18:08:53 MDT

(In reply to aseishas from comment #5)
> I wasn't able to find documentation on the nodename option; is it maybe
> node_name, instead? I've tried both without success:

It isn't documented, as that option is only relevant with --enable-multiple-slurmd.

And the --enable-multiple-slurmd option is not recommended for use on production systems.

I understand the Debian packages are convenient, but please understand that we (as SchedMD) do not control them, or what flags they may have chosen to enable. They've shipped with a weird and broken mix of flags in the past - while --enable-multiple-slurmd is not harmful in this instance it is still not recommended.

> The logs indicate success for the target process... Is there anything that
> would prevent a child of that process from inheriting the parent's cgroup?

Can you attach the slurm.conf and cgroup.conf for the system?

Comment 7 aseishas 2018-04-16 18:16:50 MDT

Created attachment 6642 [details]
slurm.conf

Comment 8 aseishas 2018-04-16 18:17:04 MDT

Created attachment 6643 [details]
cgroup.conf

Comment 9 aseishas 2018-04-16 18:19:25 MDT

Attached. I can try experimenting with a clean build tomorrow, to see if I can confirm or rule out an issue with the Debian packages in particular.

Comment 10 Tim Wickberg 2018-04-16 18:22:58 MDT

Thanks. Those look fine.

Can you paste /etc/pam.d/sshd , and any included files (likely common-account at a minimum), into here?

Comment 11 aseishas 2018-04-17 17:15:44 MDT

Created attachment 6645 [details]
/etc/pam.d/sshd

Comment 12 aseishas 2018-04-17 17:16:02 MDT

Created attachment 6646 [details]
/etc/pam.d/common-account

Comment 13 aseishas 2018-04-17 17:16:29 MDT

Created attachment 6647 [details]
/etc/pam.d/common-session

Comment 14 aseishas 2018-04-17 17:16:45 MDT

Created attachment 6648 [details]
/etc/pam.d/common-password

Comment 15 Tim Wickberg 2018-04-17 17:37:38 MDT

This needs to be nodename, not node_name:

account   required                       /lib/security/pam_slurm_adopt.so node_name=wheat09 log_level=debug5

I'd also suggest you move that after the

@include common-account

line.

The rest of that all looks okay. If you make those changes on a node, can you see if there is anything interesting captured in syslog when a user logs in to the node though SSH?

Comment 17 aseishas 2018-04-17 17:54:17 MDT

Sorry about node_name vs. nodename... I've made those two changes. Here is the whole connection attempt:

2018-04-17T16:48:30.634409-07:00 wheat05 sshd[682]: Authorized to epoch, krb5 principal epoch@stanford.edu (krb5_kuserok)
2018-04-17T16:48:30.636286-07:00 wheat05 sshd[684]: pam_succeed_if(sshd:auth): requirement "uid eq 0" not met by user "epoch"
2018-04-17T16:48:32.542455-07:00 wheat05 sshd[684]: Successful Duo login for 'epoch' from 171.64.18.17
2018-04-17T16:48:32.542831-07:00 wheat05 sshd[684]: pam_listfile(sshd:account): Refused user epoch for service sshd
2018-04-17T16:48:32.543047-07:00 wheat05 sshd[684]: pam_listfile(sshd:account): Refused user epoch for service sshd
2018-04-17T16:48:32.543258-07:00 wheat05 pam_slurm_adopt[684]: debug:  Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf
2018-04-17T16:48:32.543680-07:00 wheat05 pam_slurm_adopt[684]: debug:  Reading slurm.conf file: /etc/slurm-llnl/slurm.conf
2018-04-17T16:48:32.545404-07:00 wheat05 pam_slurm_adopt[684]: debug3: cpu_freq_verify_def: Performance set
2018-04-17T16:48:32.545734-07:00 wheat05 pam_slurm_adopt[684]: debug:  Ignoring obsolete SchedulerPort option.
2018-04-17T16:48:32.545950-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 242164, stepid = 4294967295
2018-04-17T16:48:32.546166-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244148, stepid = 4294967294
2018-04-17T16:48:32.546373-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244150, stepid = 4294967295
2018-04-17T16:48:32.546573-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244149, stepid = 0
2018-04-17T16:48:32.546923-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244150, stepid = 0
2018-04-17T16:48:32.547129-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244147, stepid = 4294967295
2018-04-17T16:48:32.547327-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244146, stepid = 4294967295
2018-04-17T16:48:32.547531-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 242164, stepid = 4294967294
2018-04-17T16:48:32.547730-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244149, stepid = 4294967295
2018-04-17T16:48:32.547931-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244149, stepid = 4294967294
2018-04-17T16:48:32.548133-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244148, stepid = 0
2018-04-17T16:48:32.548331-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244148, stepid = 4294967295
2018-04-17T16:48:32.548538-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244147, stepid = 0
2018-04-17T16:48:32.548738-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244146, stepid = 0
2018-04-17T16:48:32.548939-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244763, stepid = 4294967295
2018-04-17T16:48:32.549133-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244146, stepid = 4294967294
2018-04-17T16:48:32.549327-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244763, stepid = 0
2018-04-17T16:48:32.549528-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244147, stepid = 4294967294
2018-04-17T16:48:32.549727-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244150, stepid = 4294967294
2018-04-17T16:48:32.549931-07:00 wheat05 pam_slurm_adopt[684]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so
2018-04-17T16:48:32.550124-07:00 wheat05 pam_slurm_adopt[684]: debug:  Munge authentication plugin loaded
2018-04-17T16:48:32.550318-07:00 wheat05 pam_slurm_adopt[684]: debug3: Success.
2018-04-17T16:48:32.551875-07:00 wheat05 pam_slurm_adopt[684]: Connection by user epoch: user has only one job 244763
2018-04-17T16:48:32.552131-07:00 wheat05 pam_slurm_adopt[684]: debug:  _adopt_process: trying to get 244763.4294967295 to adopt 684
2018-04-17T16:48:32.553401-07:00 wheat05 pam_slurm_adopt[684]: debug:  Leaving stepd_add_extern_pid
2018-04-17T16:48:32.553611-07:00 wheat05 pam_slurm_adopt[684]: debug:  Leaving stepd_get_x11_display
2018-04-17T16:48:32.553792-07:00 wheat05 pam_slurm_adopt[684]: Process 684 adopted into job 244763
2018-04-17T16:48:32.554967-07:00 wheat05 sshd[682]: Accepted keyboard-interactive/pam for epoch from 171.64.18.17 port 50771 ssh2
2018-04-17T16:48:32.557891-07:00 wheat05 sshd[682]: pam_unix(sshd:session): session opened for user epoch by (uid=0)

Comment 20 Tim Wickberg 2018-05-08 23:43:43 MDT

(In reply to aseishas from comment #17)
> Sorry about node_name vs. nodename... I've made those two changes. Here is
> the whole connection attempt:

Sorry for not following back up sooner. That all looks normal at this point - are you still seeing connection issues?

And have you had a chance to switch away from the Debian packages yet? If not, we do expect to release 17.11.6 tomorrow, although none of the recent fixes are likely to address this yet.

- Tim

Comment 21 aseishas 2018-05-23 13:21:17 MDT

Sorry for the delay in responding...

We have not been able to move away from the Debian packages yet. I was able to test a new build of 17.11.6, however, with multiple-slurmd disabled. Unfortunately, there was no change in behavior:

May 23 12:14:46 oat05 sshd[11619]: Authorized to epoch, krb5 principal epoch@stanford.edu (krb5_kuserok)
May 23 12:14:46 oat05 sshd[11621]: pam_succeed_if(sshd:auth): requirement "uid eq 0" not met by user "epoch"
May 23 12:14:48 oat05 sshd[11621]: Successful Duo login for 'epoch' from 171.64.18.17
May 23 12:14:48 oat05 sshd[11621]: pam_listfile(sshd:account): Refused user epoch for service sshd
May 23 12:14:48 oat05 sshd[11621]: pam_listfile(sshd:account): Refused user epoch for service sshd
May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug:  Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf
May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug:  Reading slurm.conf file: /etc/slurm-llnl/slurm.conf
May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug3: cpu_freq_verify_def: Performance set
May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug4: found jobid = 341622, stepid = 0
May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug4: found jobid = 341622, stepid = 4294967295
May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so
May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug:  Munge authentication plugin loaded
May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug3: Success.
May 23 12:14:48 oat05 pam_slurm_adopt[11621]: Connection by user epoch: user has only one job 341622
May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug:  _adopt_process: trying to get 341622.4294967295 to adopt 11621
May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug:  Leaving stepd_add_extern_pid
May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug:  Leaving stepd_get_x11_display
May 23 12:14:48 oat05 pam_slurm_adopt[11621]: Process 11621 adopted into job 341622
May 23 12:14:48 oat05 sshd[11619]: Accepted keyboard-interactive/pam for epoch from 171.64.18.17 port 51806 ssh2
May 23 12:14:48 oat05 sshd[11619]: pam_unix(sshd:session): session opened for user epoch by (uid=0)
May 23 12:14:48 oat05 systemd-logind[2017]: New session 21 of user epoch.
May 23 12:14:48 oat05 systemd: pam_unix(systemd-user:session): session opened for user epoch by (uid=0)

But, from that session:

epoch@oat05 ~ % cat /proc/self/cgroup
12:blkio:/user.slice
11:rdma:/
10:hugetlb:/
9:memory:/user.slice
8:freezer:/
7:net_cls,net_prio:/
6:pids:/user.slice/user-23267.slice
5:devices:/user.slice
4:perf_event:/
3:cpuset:/
2:cpu,cpuacct:/user.slice
1:name=systemd:/user.slice/user-23267.slice/session-21.scope

Comment 22 aseishas 2018-05-23 15:14:58 MDT

Sorry, my test node was misconfigured! After adopting the correct configuration, however, the behavior is still unchanged:

May 23 14:09:23 oat05 sshd[13917]: Authorized to epoch, krb5 principal epoch@stanford.edu (krb5_kuserok)
May 23 14:09:23 oat05 sshd[13919]: pam_succeed_if(sshd:auth): requirement "uid eq 0" not met by user "epoch"
May 23 14:09:25 oat05 sshd[13919]: Successful Duo login for 'epoch' from 171.64.18.17
May 23 14:09:25 oat05 sshd[13919]: pam_listfile(sshd:account): Refused user epoch for service sshd
May 23 14:09:25 oat05 sshd[13919]: pam_listfile(sshd:account): Refused user epoch for service sshd
May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug:  Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf
May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug:  Reading slurm.conf file: /etc/slurm-llnl/slurm.conf
May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug3: cpu_freq_verify_def: Performance set
May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug4: found jobid = 341789, stepid = 0
May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug4: found jobid = 341789, stepid = 4294967295
May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so
May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug:  Munge authentication plugin loaded
May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug3: Success.
May 23 14:09:25 oat05 pam_slurm_adopt[13919]: Connection by user epoch: user has only one job 341789
May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug:  _adopt_process: trying to get 341789.4294967295 to adopt 13919
May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug:  Leaving stepd_add_extern_pid
May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug:  Leaving stepd_get_x11_display
May 23 14:09:25 oat05 pam_slurm_adopt[13919]: Process 13919 adopted into job 341789
May 23 14:09:25 oat05 sshd[13917]: Accepted keyboard-interactive/pam for epoch from 171.64.18.17 port 52188 ssh2
May 23 14:09:25 oat05 sshd[13917]: pam_unix(sshd:session): session opened for user epoch by (uid=0)

And:

epoch@oat05 ~ % cat /proc/self/cgroup
12:blkio:/system.slice/ssh.service
11:rdma:/
10:hugetlb:/
9:memory:/system.slice/ssh.service
8:freezer:/
7:net_cls,net_prio:/
6:pids:/system.slice/ssh.service
5:devices:/system.slice/ssh.service
4:perf_event:/
3:cpuset:/
2:cpu,cpuacct:/system.slice/ssh.service
1:name=systemd:/system.slice/ssh.service

Comment 23 Tim Wickberg 2018-06-27 14:30:15 MDT

I still don't see an obvious issue, but I'm hoping that my colleague Marshall may be able to spot something I've missed here. I'm going to turn this over to him to follow up with further, my apologies for not getting to the bottom of this yet.

- Tim

Comment 24 aseishas 2018-06-28 17:11:08 MDT

Thanks for the update... Please let me know if there's any more information we can provide.

Comment 25 Marshall Garey 2018-06-28 17:15:02 MDT

Thanks, I'll let you know. I'm working on setting up pam_slurm_adopt on a compute node with Ubuntu 16.04 to try to reproduce the issue.

Comment 26 Marshall Garey 2018-06-29 15:38:33 MDT

I've got it set up with Slurm 17.11.2, Ubuntu 16.04.

I can get access to the node with "sufficient" (when the pam_slurm_adopt module fails, it moves on to the next one), but not with "required (the pam_slurm_adopt module fails it exits and rejects my login attempt). But you're using "required", so that's not it.

I haven't been able to reproduce your exact situation. Can you re-upload your most recent /etc/pam.d/sshd file (since you've made changes to it) and mark the old one obsolete?

And is slurm.conf on your test node the exact same as the slurm.conf you've uploaded here?

Comment 32 aseishas 2018-07-02 17:27:12 MDT

Created attachment 7216 [details]
/etc/pam.d/sshd

Comment 33 aseishas 2018-07-02 17:27:37 MDT

Created attachment 7217 [details]
slurm.conf

Comment 34 aseishas 2018-07-02 17:33:16 MDT

I've attached the current /etc/pam.d/sshd and /etc/slurm-llnl/slurm.conf. I don't think either has changed substantially... I had made changes to sshd while troubleshooting (adding nodename=), but these hosts are now running 17.11.6 (with the patch provided for bug 5038) with multiple-slurmd disabled, so that change has been reverted.

Comment 35 Marshall Garey 2018-07-03 16:40:14 MDT

I still see that

@include common-account

is after pam_slurm_adopt. It should be before pam_slurm_adopt.

(In reply to aseishas from comment #34)
> I've attached the current /etc/pam.d/sshd and /etc/slurm-llnl/slurm.conf. I
> don't think either has changed substantially... I had made changes to sshd
> while troubleshooting (adding nodename=), but these hosts are now running
> 17.11.6 (with the patch provided for bug 5038) with multiple-slurmd
> disabled, so that change has been reverted.

Is your node hostname the same as the NodeName defined in slurm.conf?

Comment 38 Marshall Garey 2018-07-10 10:26:54 MDT

Hello, have you gotten a chance to make the change in comment 35? (Put common-account before pam_slurm_adopt)

Comment 39 aseishas 2018-07-10 11:51:40 MDT

Sorry for the delay... I was able to restore that change and test this morning. I'm still seeing behavior like that in comment 22: auth logs indicate the adoption succeeded, but the session doesn't end up in a Slurm cgroup. I do notice now that there is a small change in behavior after hoisting common-account, however; instead of the session ending up in a user.slice it ends up in system.slice/ssh.service.

I'll attach the corrected /etc/pam.d/sshd.

Comment 40 aseishas 2018-07-10 11:52:21 MDT

Created attachment 7262 [details]
/etc/pam.d/sshd

Comment 44 Marshall Garey 2018-07-13 13:00:57 MDT

Just to double check a couple of things:

* Have you seen this error message in the syslog/pam log?
Jul 13 12:54:26 voyager2 pam_slurm_adopt[22370]: error: Error obtaining local step information.

* Does the NodeName in slurm.conf match the node's hostname? If not, then you should use the nodename parameter for pam_slurm_adopt.

Comment 47 aseishas 2018-07-13 14:46:40 MDT

No, we haven't seen "Error obtaining local step information" at all (in all the logs I've seen pam_slurm_adopt reports that the adoption has succeeded). The NodeName is the same, but I've tried adding the option anyway; the result is still the same as in comment 22.

Comment 48 Marshall Garey 2018-07-13 14:55:39 MDT

(In reply to aseishas from comment #47)
> No, we haven't seen "Error obtaining local step information" at all (in all
> the logs I've seen pam_slurm_adopt reports that the adoption has succeeded).
> The NodeName is the same, but I've tried adding the option anyway; the
> result is still the same as in comment 22.

Alright, I didn't think so but was just checking. Since the NodeName is the same as the hostname, you don't need it. Only if it's different.

Well, I haven't succeeded in reproducing this yet (I've been testing on Ubuntu 16.04 with Slurm 17.11.2 and .7). I'll keep working on it.

Comment 50 Marshall Garey 2018-07-27 13:41:45 MDT

I've been able to reproduce (kind of) your situation by adding in these lines that you have:

account    [success=1 default=ignore] pam_listfile.so item=user sense=allow onerr=succeed file=/etc/security/rcadmins_root

...

# Bypass the pam_slurm check for certain people.
account   [success=1 default=ignore]     pam_listfile.so item=user sense=allow file=/etc/security/rcadmins_all onerr=fail

But only for the users allowed in those files. For other users, pam_slurm_adopt work as expected. I suspect you are one of the users allowed through without pam_slurm_adopt? Can you check if a user not in one of those files is restricted as expected with pam_slurm_adopt?

Comment 51 aseishas 2018-07-27 13:50:06 MDT

Normally I am listed in that file, but I remove myself for testing... pam_slurm_adopt is definitely running according to the logs, and even reports successful adoption.

Comment 52 Marshall Garey 2018-07-27 13:51:55 MDT

Well, it was a long shot. I saw your logs showing successful adopting, I just wasn't sure if something else was messed up or not.

Comment 53 Marshall Garey 2018-07-31 14:52:44 MDT

Can you set slurmddebug to debug2 on a node, reconfigure or restart slurmd on the node so the change is propagated, and ssh in as a user and cat /proc/self/cgroup as you've done before? Then can you upload the slurmd log file? I'm looking for certain log statements, including "_handle_add_extern_pid_internal".

Assuming that's there and you still have the same problems, I'll give you a small patch to apply (a sleep) so you can verify that the process actually gets added to the cgroup. Then if we can verify that, we know that the process is being taken out again by something that isn't Slurm.

Comment 55 aseishas 2018-08-01 13:00:19 MDT

Created attachment 7484 [details]
/var/log/slurm-llnl/slurmd

Comment 56 aseishas 2018-08-01 13:00:53 MDT

I've attached the slurred log; below are the results of `cat /proc/self/cgroup` and the auth.log record for the session.

% cat /proc/self/cgroup
12:perf_event:/
11:freezer:/
10:pids:/system.slice/ssh.service
9:memory:/system.slice/ssh.service
8:cpuset:/
7:net_cls,net_prio:/
6:hugetlb:/
5:devices:/system.slice/ssh.service
4:cpu,cpuacct:/system.slice/ssh.service
3:rdma:/
2:blkio:/system.slice/ssh.service
1:name=systemd:/system.slice/ssh.service

#/var/log/auth.log
Aug  1 11:53:26 oat08 sshd[24083]: Authorized to epoch, krb5 principal epoch@stanford.edu (krb5_kuserok)
Aug  1 11:53:26 oat08 sshd[24085]: pam_succeed_if(sshd:auth): requirement "uid eq 0" not met by user "epoch"
Aug  1 11:53:29 oat08 sshd[24085]: Successful Duo login for 'epoch' from 171.64.18.17
Aug  1 11:53:29 oat08 sshd[24085]: pam_listfile(sshd:account): Refused user epoch for service sshd
Aug  1 11:53:29 oat08 sshd[24085]: pam_listfile(sshd:account): Refused user epoch for service sshd
Aug  1 11:53:29 oat08 pam_slurm_adopt[24085]: debug:  Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf
Aug  1 11:53:29 oat08 pam_slurm_adopt[24085]: debug:  Reading slurm.conf file: /etc/slurm-llnl/slurm.conf
Aug  1 11:53:29 oat08 pam_slurm_adopt[24085]: debug3: cpu_freq_verify_def: Performance set
Aug  1 11:53:29 oat08 pam_slurm_adopt[24085]: debug4: found jobid = 1190329, stepid = 4294967295
Aug  1 11:53:29 oat08 pam_slurm_adopt[24085]: debug4: found jobid = 1190329, stepid = 0
Aug  1 11:53:29 oat08 pam_slurm_adopt[24085]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so
Aug  1 11:53:29 oat08 pam_slurm_adopt[24085]: debug:  Munge authentication plugin loaded
Aug  1 11:53:29 oat08 pam_slurm_adopt[24085]: debug3: Success.
Aug  1 11:53:29 oat08 pam_slurm_adopt[24085]: Connection by user epoch: user has only one job 1190329
Aug  1 11:53:29 oat08 pam_slurm_adopt[24085]: debug:  _adopt_process: trying to get 1190329.4294967295 to adopt 24085
Aug  1 11:53:29 oat08 pam_slurm_adopt[24085]: debug:  Leaving stepd_add_extern_pid
Aug  1 11:53:29 oat08 pam_slurm_adopt[24085]: debug:  Leaving stepd_get_x11_display
Aug  1 11:53:29 oat08 pam_slurm_adopt[24085]: Process 24085 adopted into job 1190329
Aug  1 11:53:29 oat08 sshd[24083]: Accepted keyboard-interactive/pam for epoch from 171.64.18.17 port 64103 ssh2
Aug  1 11:53:29 oat08 sshd[24083]: pam_unix(sshd:session): session opened for user epoch by (uid=0)
Aug  1 11:53:42 oat08 sshd[24113]: Received disconnect from 171.64.18.17 port 64103:11: disconnected by user
Aug  1 11:53:42 oat08 sshd[24113]: Disconnected from 171.64.18.17 port 64103
Aug  1 11:53:42 oat08 sshd[24083]: pam_unix(sshd:session): session closed for user epoch

Comment 57 Marshall Garey 2018-08-01 14:26:33 MDT

Thanks, that's helpful. It looks like the process is getting added to the extern cgroups. This patch inserts a sleep right after the process gets added to the cgroups. It will stall the PAM stack from returning from pam_slurm_adopt for a minute to give you time to verify that the PID did get properly added to the cgroup.

Can you have an ssh session ready on a node, then apply this patch, recompile, and run another job on that node? You shouldn't have to restart slurmd after applying the patch and recompiling, since this is executed by the stepd, so the new slurmstepd binary will be executed when you run the new job.

Then go to

/sys/fs/cgroup/freezer/slurm/uid_<your uid>/job_<your job id>/step_extern

and run

cat cgroup.procs

and you'll see the PID of the extern sleep process.

Then, SSH in, and it will stall for a minute. Then, run

cat cgroup.procs

again and you *should* see another PID. If that's there, you know that process was correctly adopted into the extern step cgroup, and that something else is removing it later.

After a minute (you can reduce the sleep smaller than 60 seconds if you want), the function will return, and you'll be on the node. You can check /proc/self/cgroup again and also cgroup.procs in the extern step cgroup directory.

diff --git a/src/slurmd/slurmstepd/req.c b/src/slurmd/slurmstepd/req.c
index efd18f7647..100a3d9227 100644
--- a/src/slurmd/slurmstepd/req.c
+++ b/src/slurmd/slurmstepd/req.c
@@ -1333,6 +1333,8 @@ static int _handle_add_extern_pid_internal(stepd_step_rec_t *job, pid_t pid)
                return SLURM_FAILURE;
        }
 
+       sleep(60);
+
        /* spawn a thread that will wait on the pid given */
        slurm_thread_create_detached(NULL, _wait_extern_pid, extern_pid);

Comment 58 Marshall Garey 2018-08-06 09:47:54 MDT

Have you had a chance to try comment 57?

Comment 59 aseishas 2018-08-08 15:59:38 MDT

Sorry for the delay... It definitely looks like something is removing it:

% cat cgroup.procs                         
3195
% cat cgroup.procs
3195
3285
% cat cgroup.procs
3195

Comment 61 Marshall Garey 2018-08-09 14:56:45 MDT

That's good to know that pam_slurm_adopt is working correctly.

I can't find anything wrong in your PAM stack. Can you investigate the possibility that systemd is interfering somehow either in or out of your PAM stack and removing the PID from the cgroup after pam_slurm_adopt places the PID in the cgroup?

I'm also looking at that, but I haven't found anything yet.

Comment 62 Marshall Garey 2018-08-22 12:09:22 MDT

Can you remove /lib*/security/pam_systemd.so - maybe just move it to a temporary directory? Then can you re-run the test? I don't think that pam_systemd.so is being used anywhere in your PAM stack, but this would be a definitive test.

Comment 63 Marshall Garey 2018-08-24 09:44:08 MDT

Do you think that systemd-logind might be causing the problem? Do you think you could disable it temporarily and test that out?

Comment 64 Marshall Garey 2018-08-24 09:47:03 MDT

(I'd try comment 62 first.)

Comment 65 Marshall Garey 2018-08-30 10:45:12 MDT

Just a ping - have you had a chance to try comment 62 (first) and then comment 63?

Comment 66 aseishas 2018-09-04 14:55:48 MDT

I was finally able to test these... Unfortunately, neither moving pam_systemd.so or disabling systemd-logind made any difference.

Comment 69 Marshall Garey 2018-09-11 11:54:50 MDT

I have one more idea to try to eliminate any possibility that anything in the PAM stack is interfering.

For a single test node, can you comment out all lines of the PAM stack except for pam_slurm_adopt and run your test again? If the session is correctly inside of the job extern cgroup, then we know something in the PAM stack is causing the problem, and you can begin uncommenting lines and see who the culprit is. If the session is still not inside of the job extern cgroup, then we know something else  outside of the PAM stack is the problem (probably systemd).

Just make sure you maintain your ssh session in the node so you don't get locked out of the node.

I also fixed another bug in pam_slurm_adopt where action_adopt_failure wasn't working (patch under review). But that's unrelated to your bug.

Comment 70 Marshall Garey 2018-09-18 10:14:23 MDT

Have you had a chance to try the latest test? (Comment 69)

Comment 71 Marshall Garey 2018-09-25 10:17:33 MDT

I'm closing this as timedout for now. Please reopen when you have a chance to perform the test.

Comment 72 aseishas 2018-10-11 12:36:46 MDT

I'm sorry again for the delay. Even stripping /etc/pam.d/sshd back to just a bare pam_slurm_adopt.so line doesn't seem to work; the session ends up in system.slice/ssh.service, as before.

Comment 73 Marshall Garey 2018-10-22 13:22:50 MDT

Can you upload the log files (slurmd, syslog and pam) for this latest attempt?

I can't reproduce it, even on Ubuntu 16.04, slurm 17.11.02, using your slurm.conf, and with only pam_slurm_adopt.so in sshd.

Comment 74 aseishas 2018-10-23 13:25:38 MDT

I've done some more careful testing, and I was wrong: reducing /etc/pam.d/sshd to just the pam_slurm_adopt module does substantially resolve the issue. It looks like only some resources are contained by Slurm, however (e.g., cpuset, but not memory):

$ cat /proc/self/cgroup
12:hugetlb:/
11:pids:/user.slice/user-23267.slice
10:perf_event:/
9:cpuset:/slurm/uid_23267/job_1314815/step_extern
8:freezer:/slurm/uid_23267/job_1314815/step_extern
7:blkio:/user.slice
6:cpu,cpuacct:/user.slice
5:memory:/user.slice
4:devices:/user.slice
3:net_cls,net_prio:/
2:rdma:/
1:name=systemd:/user.slice/user-23267.slice/session-c2.scope

Is that the expected behavior?

See below for the /var/log/auth.log trace of this session, and I will attach a copy of slurmd.log. I'll continue looking for the particular module that causes the session to escape.

Oct 23 11:58:19 oat07 sshd[20088]: Connection from 171.64.18.17 port 63131 on 171.67.51.17 port 22
Oct 23 11:58:19 oat07 sshd[20088]: Authorized to epoch, krb5 principal epoch@stanford.edu (krb5_kuserok)
Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug:  Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf
Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug:  Reading slurm.conf file: /etc/slurm-llnl/slurm.conf
Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug3: cpu_freq_verify_def: Performance set
Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug4: found jobid = 1314815, stepid = 0
Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug4: found jobid = 1314815, stepid = 4294967295
Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so
Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug:  Munge authentication plugin loaded
Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug3: Success.
Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: Connection by user epoch: user has only one job 1314815
Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug:  _adopt_process: trying to get 1314815.4294967295 to adopt 20088
Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug:  Leaving stepd_add_extern_pid
Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug:  Leaving stepd_get_x11_display
Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: Process 20088 adopted into job 1314815
Oct 23 11:58:19 oat07 sshd[20088]: Accepted gssapi-with-mic for epoch from 171.64.18.17 port 63131 ssh2
Oct 23 11:58:19 oat07 sshd[20088]: pam_unix(sshd:session): session opened for user epoch by (uid=0)
Oct 23 11:58:19 oat07 systemd-logind[2263]: New session c2 of user epoch.
Oct 23 11:58:19 oat07 systemd: pam_unix(systemd-user:session): session opened for user epoch by (uid=0)
Oct 23 11:58:19 oat07 sshd[20088]: User child is on pid 20103
Oct 23 11:58:19 oat07 sshd[20103]: Starting session: shell on pts/3 for epoch from 171.64.18.17 port 63131 id 0
Oct 23 11:58:42 oat07 systemd-logind[2263]: Removed session c2.
Oct 23 11:58:42 oat07 systemd: pam_unix(systemd-user:session): session closed for user epoch

Comment 75 aseishas 2018-10-23 13:25:55 MDT

Created attachment 8083 [details]
/var/log/slurm-llnl/slurmd

Comment 77 Marshall Garey 2018-10-24 09:42:57 MDT

There was an thread on the Slurm-users list recently where someone had the same problem. I know you've removed all references to pam_systemd.so - but it looks like you may also need to  mask and disable systemd-logind, which fixed that problem for them.

systemctl stop systemd-logind
systemctl mask systemd-logind

Does this solve the issue?

Comment 78 Marshall Garey 2018-10-24 09:47:34 MDT

This is the slurm-users thread:

https://groups.google.com/forum/#!topic/slurm-users/3vZUoZ_DXJI

Comment 79 aseishas 2018-10-24 16:15:10 MDT

Stopping and masking systems-logind does ensure that the session remains fully contained. Thank you! I think we also have a theory as to why processes are not contained with our configuration in the first place, even though we have not been able to identify a particular culprit module.

We use Duo for two-factor authentication, which requires keyboard-interactive auth to be enabled in the sshd configuration. When interactive authentication runs it looks like sshd spawns a subprocess for the purpose, and it's this subprocess that ends up contained by pam_slurm_adopt. The subprocess dies, and the user's shell ends up a child of the original, uncontained parent process, instead. If keyboard-interactive authentication is disabled, it looks like a single sshd process is responsible for the entire authentication flow...

Without keyboard-interactive (one PID):

Oct 24 13:10:10 oat07 sshd[17395]: Authorized to epoch, krb5 principal epoch@stanford.edu (krb5_kuserok)
Oct 24 13:10:10 oat07 sshd[17395]: pam_listfile(sshd:account): Refused user epoch for service sshd
Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug:  Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf
Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug:  Reading slurm.conf file: /etc/slurm-llnl/slurm.conf
Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug3: cpu_freq_verify_def: Performance set
Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug4: found jobid = 1315105, stepid = 0
Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug4: found jobid = 1315105, stepid = 4294967295
Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so
Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug:  Munge authentication plugin loaded
Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug3: Success.
Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: Connection by user epoch: user has only one job 1315105
Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug:  _adopt_process: trying to get 1315105.4294967295 to adopt 17395
Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug:  Leaving stepd_add_extern_pid
Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug:  Leaving stepd_get_x11_display
Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: Process 17395 adopted into job 1315105
Oct 24 13:10:10 oat07 sshd[17395]: Accepted gssapi-with-mic for epoch from 171.64.18.17 port 56511 ssh2
Oct 24 13:10:10 oat07 sshd[17395]: pam_unix(sshd:session): session opened for user epoch by (uid=0)
Oct 24 13:10:10 oat07 sshd[17395]: pam_systemd(sshd:session): Failed to create session: Unit systemd-logind.service is masked.

With keyboard-interactive (two PIDs):

Oct 24 13:12:25 oat07 sshd[17654]: Authorized to epoch, krb5 principal epoch@stanford.edu (krb5_kuserok)
Oct 24 13:12:25 oat07 sshd[17656]: pam_succeed_if(sshd:auth): requirement "uid eq 0" not met by user "epoch"
Oct 24 13:12:27 oat07 sshd[17656]: Successful Duo login for 'epoch' from 171.64.18.17
Oct 24 13:12:27 oat07 sshd[17656]: pam_listfile(sshd:account): Refused user epoch for service sshd
Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug:  Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf
Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug:  Reading slurm.conf file: /etc/slurm-llnl/slurm.conf
Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug3: cpu_freq_verify_def: Performance set
Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug4: found jobid = 1315105, stepid = 0
Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug4: found jobid = 1315105, stepid = 4294967295
Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so
Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug:  Munge authentication plugin loaded
Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug3: Success.
Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: Connection by user epoch: user has only one job 1315105
Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug:  _adopt_process: trying to get 1315105.4294967295 to adopt 17656
Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug:  Leaving stepd_add_extern_pid
Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug:  Leaving stepd_get_x11_display
Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: Process 17656 adopted into job 1315105
Oct 24 13:12:27 oat07 sshd[17654]: Accepted keyboard-interactive/pam for epoch from 171.64.18.17 port 56523 ssh2
Oct 24 13:12:27 oat07 sshd[17654]: pam_unix(sshd:session): session opened for user epoch by (uid=0)
Oct 24 13:12:27 oat07 sshd[17654]: pam_systemd(sshd:session): Failed to create session: Unit systemd-logind.service is masked.

Comment 80 Marshall Garey 2018-10-25 10:28:53 MDT

That's a great catch. I think you're right about the child pid being adopted rather than the parent. I'll look into this more.

Comment 82 Karl Kornel 2018-10-26 14:32:57 MDT

Hello!

I've been working with Adam on this (outside of the ticket), and I just thought: Would it be possible to make pam_slurm_adopt work as an account module and a session module, with the adoption code being moved to the session module?

What I mean is, it seems to me that pam_slurm_adopt does two thing: Check if the user has a running job, and then adopt their SSH connection into the job's cgroup.  If the "adopt their SSH connection into the job's cgroup" function is moved into a session call (instead of an account call), then I think there would be a couple of benefits:

1. It would avoid all of the child process craziness.  At the end of comment 79, Adam's syslog output shows that the PAM session modules (search for "sshd:session") are all running in the parent process.  So, doing the adoption as part of PAM session setup means the parent process gets put into the cgroup.

2. pam_systemd is also a session module.  pam_slurm_adopt's session entry could be placed after pam_systemd, so that if pam_systemd messes with a user's cgroups, then pam_slurm_adopt could immediately go in and undo that.

(Of course, if systemd-logind messes with a session's cgroups outside of pam_systemd, then we wouldn't be able to catch that, so #2 is not guaranteed!)

Comment 83 Marshall Garey 2018-10-26 16:01:56 MDT

Hi Karl,

There's already a discussion about this going on in bug 5920 (and on the slurm-user's list). The issue is still with systemd - we recently ran into a situation with systemctl daemon-reload with Delegate=yes where systemd stole back the cgroups. We've fixed that one, but believe there are other cases in which systemd will steal back the cgroups. We're still discussing this internally.

Another option that you have is to just adopt the parent as well as the child. My thought was to add in code to adopt the parent, as long as its pid is not 1 (the init process) and the UID of the parent matches the UID of the child. Unfortunately, there's no system call to do that, so it would involve adding in a bit more code to parse /proc/<pid>/status.

I'm not sure that we'd add this in to the repo (we need to discuss it more), but it *should* (I think) work as a local patch. Something similar to this (WARNING - this compiles, but I haven't tested it, since I don't have something to replicate what you're seeing):

$ git diff
diff --git a/contribs/pam_slurm_adopt/pam_slurm_adopt.c b/contribs/pam_slurm_adopt/pam_slurm_adopt.c
index 12cc4bd4b9..0d8099549b 100644
--- a/contribs/pam_slurm_adopt/pam_slurm_adopt.c
+++ b/contribs/pam_slurm_adopt/pam_slurm_adopt.c
@@ -117,7 +117,8 @@ static int _adopt_process(pam_handle_t *pamh, pid_t pid, step_loc_t *stepd)
 {
        int fd;
        uint16_t protocol_version;
-       int rc;
+       int rc, rc2;
+       pid_t ppid;
 
        if (!stepd)
                return -1;
@@ -132,9 +133,12 @@ static int _adopt_process(pam_handle_t *pamh, pid_t pid, step_loc_t *stepd)
                return -1;
        }                                                                                                                     
                                                                                                                              
+       ppid = getppid();                                                                                                     
+                                                                                                                             
        rc = stepd_add_extern_pid(fd, stepd->protocol_version, pid);                                                          
+       rc2 = stepd_add_extern_pid(fd, stepd->protocol_version, ppid);
 
-       if ((rc == PAM_SUCCESS) && !opts.disable_x11) {
+       if ((rc == PAM_SUCCESS) && (rc2 == PAM_SUCCESS) && !opts.disable_x11) {
                int display;
                display = stepd_get_x11_display(fd, stepd->protocol_version);


I suggest you find an appropriate workaround (either your option or adopting the parent pid option here) and do that locally. I'm not sure if the child process craziness is a bug fix or a feature request, so I need to explore that more myself. I'd like to move the discussion about pam_slurm_adopt in both account and session to bug 5920. Does that sound good to you?

Comment 84 aseishas 2018-10-31 08:20:55 MDT

I am out of the office through Monday, November 5, and will respond to your
message when I return. If you need immediate assistance please contact:
srcc-support@stanford.edu

--
Adam Seishas
Research Computing Specialist
Stanford | University IT
aseishas@stanford.edu | 650.725.7490

Comment 85 Marshall Garey 2018-11-13 11:19:46 MST

Hi, I'd just like to follow up on my previous comment (comment 83).

Also, we are looking into the possibility of not disabling pam_systemd at bug 5920.

Comment 86 aseishas 2018-11-13 11:44:18 MST

Thanks! We're following 5920 now.

Comment 87 Marshall Garey 2018-11-13 11:56:28 MST

Sounds good. I'm closing this ticket as resolved/infogiven for now. If 5920 doesn't resolve your situation, then feel free to re-open this ticket or open a new one.

Comment 88 Marshall Garey 2018-11-13 12:09:09 MST

I've filed another internal bug to (1) document the current need to mask and stop systemd-logind, and (2) further discuss the problems surrounding child processes during authentication. I'm not sure if that's unique to dual-factor authentication that uses interactive sessions or not. It's possible that we just document this problem for now until we find an appropriate solution.