Description
Karl Kornel
2018-04-04 19:44:16 MDT
> Apr 4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug: Reading cgroup.conf > file /etc/slurm-llnl/cgroup.conf > Apr 4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug: Ignoring obsolete > CgroupReleaseAgentDir option. > Apr 4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug: Reading slurm.conf > file: /etc/slurm-llnl/slurm.conf > Apr 4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug: Ignoring obsolete > SchedulerPort option. > Apr 4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug: Munge authentication > plugin loaded > Apr 4 17:07:51 wheat05 pam_slurm_adopt[3864]: Connection by user akkornel: > user has only one job 223952 > Apr 4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug: _adopt_process: > trying to get 223952.4294967295 to adopt 3864 > Apr 4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug: Leaving > stepd_add_extern_pid > Apr 4 17:07:51 wheat05 pam_slurm_adopt[3864]: debug: Leaving > stepd_get_x11_display > Apr 4 17:07:51 wheat05 pam_slurm_adopt[3864]: Process 3864 adopted into job > 223952 Totally unrelated, but can I suggest you clean up your slurm.conf (SchedulerPort) and cgroup.conf (CgroupReleaseAgentDir) a bit and remove those obsolete settings? > The above output is what clued me in to the possible problem. Debian's > packages build SLURM with --enable-multiple-slurmd. I found that when the > pam_slurm_adopt code was first added (in > https://bugs.schedmd.com/attachment.cgi?id=2336&action=diff), the original > code (in _adopt_process) did hard-code "/slurm/uid_%u…" as the cgroup path. > But, the code has changed so much since then, I'm not able to trace things > now. That shouldn't be a problem. Your logs indicate it is trying to adopt the process, which implies it did manage to identify the correct slurmd. I use multiple-slurmd builds on a lot of systems - pam_slurm_adopt works fine this way, as long as it was also built with --enable-multiple-slurmd. > Just to confirm, we have set PrologFlags=contain in slurm.conf. We also > tried with pam_systemd.so removed (as per bug #4098) and that did not have > any effect. I'm pretty sure that's the issue though... the /system.slice/ references there imply pam_systemd is involved. Can you walk me through the modifications you've made under /etc/pam.d/, and make sure to comment out any references to pam_systemd? The obsolete options have been removed. We assumed pam_systemd was the issue at first, too, but removing it did not seem to change behavior. The only change to /etc/pam.d/sshd we made was to add a line for pam_slurm_adopt (replacing an existing line for pam_slurm): `account required /lib/security/pam_slurm_adopt.so`. The sshd file does not have a line for pam_systemd but includes common-account, common-session, and common-password; of these, only common-session contains a line for pam_systemd, `session optional pam_systemd.so`, which we commented out. % grep systemd /etc/pam.d/sshd % grep @ /etc/pam.d/sshd @include common-account @include common-session @include common-password % grep systemd /etc/pam.d/common-* /etc/pam.d/common-session:#session optional pam_systemd.so % grep @ /etc/pam.d/common-* % I can attach copies of these files, if necessary. Any update or advice on this issue, or any further information we can provide? It doesn't look like pam_systemd is in the stack, but perhaps systemd is getting involved elsewhere? Running through this again, I think your original theory about --enable-multiple-slurmd having affected this was correct. Can you try adding the nodename=$whateverthenodename option on the pam_slurm_adopt line in the configs? It does look like leaving that off may be preventing it from finding the correct location in the hierarchy. As an aside - we don't usually recommend using the Debian packages for this (and other) reasons. You may want to look into locally installing them if possible at some point. The 17.11.2 release is also before the most recent security update for SlurmDBD - if you haven't looked at patching the SlurmDBD at least I would recommend you do so ASAP. - Tim I wasn't able to find documentation on the nodename option; is it maybe node_name, instead? I've tried both without success: Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug: Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug: Reading slurm.conf file: /etc/slurm-llnl/slurm.conf Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug3: cpu_freq_verify_def: Performance set Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug4: found jobid = 243802, stepid = 4294967295 Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug4: found jobid = 243802, stepid = 0 Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug: Munge authentication plugin loaded Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug3: Success. Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: Connection by user epoch: user has only one job 243802 Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug: _adopt_process: trying to get 243802.4294967295 to adopt 6455 Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug: Leaving stepd_add_extern_pid Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: debug: Leaving stepd_get_x11_display Apr 16 16:36:08 wheat09 pam_slurm_adopt[6455]: Process 6455 adopted into job 243802 The logs indicate success for the target process... Is there anything that would prevent a child of that process from inheriting the parent's cgroup? (In reply to aseishas from comment #5) > I wasn't able to find documentation on the nodename option; is it maybe > node_name, instead? I've tried both without success: It isn't documented, as that option is only relevant with --enable-multiple-slurmd. And the --enable-multiple-slurmd option is not recommended for use on production systems. I understand the Debian packages are convenient, but please understand that we (as SchedMD) do not control them, or what flags they may have chosen to enable. They've shipped with a weird and broken mix of flags in the past - while --enable-multiple-slurmd is not harmful in this instance it is still not recommended. > The logs indicate success for the target process... Is there anything that > would prevent a child of that process from inheriting the parent's cgroup? Can you attach the slurm.conf and cgroup.conf for the system? Created attachment 6642 [details]
slurm.conf
Created attachment 6643 [details]
cgroup.conf
Attached. I can try experimenting with a clean build tomorrow, to see if I can confirm or rule out an issue with the Debian packages in particular. Thanks. Those look fine. Can you paste /etc/pam.d/sshd , and any included files (likely common-account at a minimum), into here? Created attachment 6645 [details]
/etc/pam.d/sshd
Created attachment 6646 [details]
/etc/pam.d/common-account
Created attachment 6647 [details]
/etc/pam.d/common-session
Created attachment 6648 [details]
/etc/pam.d/common-password
This needs to be nodename, not node_name: account required /lib/security/pam_slurm_adopt.so node_name=wheat09 log_level=debug5 I'd also suggest you move that after the @include common-account line. The rest of that all looks okay. If you make those changes on a node, can you see if there is anything interesting captured in syslog when a user logs in to the node though SSH? Sorry about node_name vs. nodename... I've made those two changes. Here is the whole connection attempt: 2018-04-17T16:48:30.634409-07:00 wheat05 sshd[682]: Authorized to epoch, krb5 principal epoch@stanford.edu (krb5_kuserok) 2018-04-17T16:48:30.636286-07:00 wheat05 sshd[684]: pam_succeed_if(sshd:auth): requirement "uid eq 0" not met by user "epoch" 2018-04-17T16:48:32.542455-07:00 wheat05 sshd[684]: Successful Duo login for 'epoch' from 171.64.18.17 2018-04-17T16:48:32.542831-07:00 wheat05 sshd[684]: pam_listfile(sshd:account): Refused user epoch for service sshd 2018-04-17T16:48:32.543047-07:00 wheat05 sshd[684]: pam_listfile(sshd:account): Refused user epoch for service sshd 2018-04-17T16:48:32.543258-07:00 wheat05 pam_slurm_adopt[684]: debug: Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf 2018-04-17T16:48:32.543680-07:00 wheat05 pam_slurm_adopt[684]: debug: Reading slurm.conf file: /etc/slurm-llnl/slurm.conf 2018-04-17T16:48:32.545404-07:00 wheat05 pam_slurm_adopt[684]: debug3: cpu_freq_verify_def: Performance set 2018-04-17T16:48:32.545734-07:00 wheat05 pam_slurm_adopt[684]: debug: Ignoring obsolete SchedulerPort option. 2018-04-17T16:48:32.545950-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 242164, stepid = 4294967295 2018-04-17T16:48:32.546166-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244148, stepid = 4294967294 2018-04-17T16:48:32.546373-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244150, stepid = 4294967295 2018-04-17T16:48:32.546573-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244149, stepid = 0 2018-04-17T16:48:32.546923-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244150, stepid = 0 2018-04-17T16:48:32.547129-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244147, stepid = 4294967295 2018-04-17T16:48:32.547327-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244146, stepid = 4294967295 2018-04-17T16:48:32.547531-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 242164, stepid = 4294967294 2018-04-17T16:48:32.547730-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244149, stepid = 4294967295 2018-04-17T16:48:32.547931-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244149, stepid = 4294967294 2018-04-17T16:48:32.548133-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244148, stepid = 0 2018-04-17T16:48:32.548331-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244148, stepid = 4294967295 2018-04-17T16:48:32.548538-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244147, stepid = 0 2018-04-17T16:48:32.548738-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244146, stepid = 0 2018-04-17T16:48:32.548939-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244763, stepid = 4294967295 2018-04-17T16:48:32.549133-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244146, stepid = 4294967294 2018-04-17T16:48:32.549327-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244763, stepid = 0 2018-04-17T16:48:32.549528-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244147, stepid = 4294967294 2018-04-17T16:48:32.549727-07:00 wheat05 pam_slurm_adopt[684]: debug4: found jobid = 244150, stepid = 4294967294 2018-04-17T16:48:32.549931-07:00 wheat05 pam_slurm_adopt[684]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so 2018-04-17T16:48:32.550124-07:00 wheat05 pam_slurm_adopt[684]: debug: Munge authentication plugin loaded 2018-04-17T16:48:32.550318-07:00 wheat05 pam_slurm_adopt[684]: debug3: Success. 2018-04-17T16:48:32.551875-07:00 wheat05 pam_slurm_adopt[684]: Connection by user epoch: user has only one job 244763 2018-04-17T16:48:32.552131-07:00 wheat05 pam_slurm_adopt[684]: debug: _adopt_process: trying to get 244763.4294967295 to adopt 684 2018-04-17T16:48:32.553401-07:00 wheat05 pam_slurm_adopt[684]: debug: Leaving stepd_add_extern_pid 2018-04-17T16:48:32.553611-07:00 wheat05 pam_slurm_adopt[684]: debug: Leaving stepd_get_x11_display 2018-04-17T16:48:32.553792-07:00 wheat05 pam_slurm_adopt[684]: Process 684 adopted into job 244763 2018-04-17T16:48:32.554967-07:00 wheat05 sshd[682]: Accepted keyboard-interactive/pam for epoch from 171.64.18.17 port 50771 ssh2 2018-04-17T16:48:32.557891-07:00 wheat05 sshd[682]: pam_unix(sshd:session): session opened for user epoch by (uid=0) (In reply to aseishas from comment #17) > Sorry about node_name vs. nodename... I've made those two changes. Here is > the whole connection attempt: Sorry for not following back up sooner. That all looks normal at this point - are you still seeing connection issues? And have you had a chance to switch away from the Debian packages yet? If not, we do expect to release 17.11.6 tomorrow, although none of the recent fixes are likely to address this yet. - Tim Sorry for the delay in responding... We have not been able to move away from the Debian packages yet. I was able to test a new build of 17.11.6, however, with multiple-slurmd disabled. Unfortunately, there was no change in behavior: May 23 12:14:46 oat05 sshd[11619]: Authorized to epoch, krb5 principal epoch@stanford.edu (krb5_kuserok) May 23 12:14:46 oat05 sshd[11621]: pam_succeed_if(sshd:auth): requirement "uid eq 0" not met by user "epoch" May 23 12:14:48 oat05 sshd[11621]: Successful Duo login for 'epoch' from 171.64.18.17 May 23 12:14:48 oat05 sshd[11621]: pam_listfile(sshd:account): Refused user epoch for service sshd May 23 12:14:48 oat05 sshd[11621]: pam_listfile(sshd:account): Refused user epoch for service sshd May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug: Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug: Reading slurm.conf file: /etc/slurm-llnl/slurm.conf May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug3: cpu_freq_verify_def: Performance set May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug4: found jobid = 341622, stepid = 0 May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug4: found jobid = 341622, stepid = 4294967295 May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug: Munge authentication plugin loaded May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug3: Success. May 23 12:14:48 oat05 pam_slurm_adopt[11621]: Connection by user epoch: user has only one job 341622 May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug: _adopt_process: trying to get 341622.4294967295 to adopt 11621 May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug: Leaving stepd_add_extern_pid May 23 12:14:48 oat05 pam_slurm_adopt[11621]: debug: Leaving stepd_get_x11_display May 23 12:14:48 oat05 pam_slurm_adopt[11621]: Process 11621 adopted into job 341622 May 23 12:14:48 oat05 sshd[11619]: Accepted keyboard-interactive/pam for epoch from 171.64.18.17 port 51806 ssh2 May 23 12:14:48 oat05 sshd[11619]: pam_unix(sshd:session): session opened for user epoch by (uid=0) May 23 12:14:48 oat05 systemd-logind[2017]: New session 21 of user epoch. May 23 12:14:48 oat05 systemd: pam_unix(systemd-user:session): session opened for user epoch by (uid=0) But, from that session: epoch@oat05 ~ % cat /proc/self/cgroup 12:blkio:/user.slice 11:rdma:/ 10:hugetlb:/ 9:memory:/user.slice 8:freezer:/ 7:net_cls,net_prio:/ 6:pids:/user.slice/user-23267.slice 5:devices:/user.slice 4:perf_event:/ 3:cpuset:/ 2:cpu,cpuacct:/user.slice 1:name=systemd:/user.slice/user-23267.slice/session-21.scope Sorry, my test node was misconfigured! After adopting the correct configuration, however, the behavior is still unchanged: May 23 14:09:23 oat05 sshd[13917]: Authorized to epoch, krb5 principal epoch@stanford.edu (krb5_kuserok) May 23 14:09:23 oat05 sshd[13919]: pam_succeed_if(sshd:auth): requirement "uid eq 0" not met by user "epoch" May 23 14:09:25 oat05 sshd[13919]: Successful Duo login for 'epoch' from 171.64.18.17 May 23 14:09:25 oat05 sshd[13919]: pam_listfile(sshd:account): Refused user epoch for service sshd May 23 14:09:25 oat05 sshd[13919]: pam_listfile(sshd:account): Refused user epoch for service sshd May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug: Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug: Reading slurm.conf file: /etc/slurm-llnl/slurm.conf May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug3: cpu_freq_verify_def: Performance set May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug4: found jobid = 341789, stepid = 0 May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug4: found jobid = 341789, stepid = 4294967295 May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug: Munge authentication plugin loaded May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug3: Success. May 23 14:09:25 oat05 pam_slurm_adopt[13919]: Connection by user epoch: user has only one job 341789 May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug: _adopt_process: trying to get 341789.4294967295 to adopt 13919 May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug: Leaving stepd_add_extern_pid May 23 14:09:25 oat05 pam_slurm_adopt[13919]: debug: Leaving stepd_get_x11_display May 23 14:09:25 oat05 pam_slurm_adopt[13919]: Process 13919 adopted into job 341789 May 23 14:09:25 oat05 sshd[13917]: Accepted keyboard-interactive/pam for epoch from 171.64.18.17 port 52188 ssh2 May 23 14:09:25 oat05 sshd[13917]: pam_unix(sshd:session): session opened for user epoch by (uid=0) And: epoch@oat05 ~ % cat /proc/self/cgroup 12:blkio:/system.slice/ssh.service 11:rdma:/ 10:hugetlb:/ 9:memory:/system.slice/ssh.service 8:freezer:/ 7:net_cls,net_prio:/ 6:pids:/system.slice/ssh.service 5:devices:/system.slice/ssh.service 4:perf_event:/ 3:cpuset:/ 2:cpu,cpuacct:/system.slice/ssh.service 1:name=systemd:/system.slice/ssh.service I still don't see an obvious issue, but I'm hoping that my colleague Marshall may be able to spot something I've missed here. I'm going to turn this over to him to follow up with further, my apologies for not getting to the bottom of this yet. - Tim Thanks for the update... Please let me know if there's any more information we can provide. Thanks, I'll let you know. I'm working on setting up pam_slurm_adopt on a compute node with Ubuntu 16.04 to try to reproduce the issue. I've got it set up with Slurm 17.11.2, Ubuntu 16.04. I can get access to the node with "sufficient" (when the pam_slurm_adopt module fails, it moves on to the next one), but not with "required (the pam_slurm_adopt module fails it exits and rejects my login attempt). But you're using "required", so that's not it. I haven't been able to reproduce your exact situation. Can you re-upload your most recent /etc/pam.d/sshd file (since you've made changes to it) and mark the old one obsolete? And is slurm.conf on your test node the exact same as the slurm.conf you've uploaded here? Created attachment 7216 [details]
/etc/pam.d/sshd
Created attachment 7217 [details]
slurm.conf
I've attached the current /etc/pam.d/sshd and /etc/slurm-llnl/slurm.conf. I don't think either has changed substantially... I had made changes to sshd while troubleshooting (adding nodename=), but these hosts are now running 17.11.6 (with the patch provided for bug 5038) with multiple-slurmd disabled, so that change has been reverted. I still see that @include common-account is after pam_slurm_adopt. It should be before pam_slurm_adopt. (In reply to aseishas from comment #34) > I've attached the current /etc/pam.d/sshd and /etc/slurm-llnl/slurm.conf. I > don't think either has changed substantially... I had made changes to sshd > while troubleshooting (adding nodename=), but these hosts are now running > 17.11.6 (with the patch provided for bug 5038) with multiple-slurmd > disabled, so that change has been reverted. Is your node hostname the same as the NodeName defined in slurm.conf? Hello, have you gotten a chance to make the change in comment 35? (Put common-account before pam_slurm_adopt) Sorry for the delay... I was able to restore that change and test this morning. I'm still seeing behavior like that in comment 22: auth logs indicate the adoption succeeded, but the session doesn't end up in a Slurm cgroup. I do notice now that there is a small change in behavior after hoisting common-account, however; instead of the session ending up in a user.slice it ends up in system.slice/ssh.service. I'll attach the corrected /etc/pam.d/sshd. Created attachment 7262 [details]
/etc/pam.d/sshd
Just to double check a couple of things: * Have you seen this error message in the syslog/pam log? Jul 13 12:54:26 voyager2 pam_slurm_adopt[22370]: error: Error obtaining local step information. * Does the NodeName in slurm.conf match the node's hostname? If not, then you should use the nodename parameter for pam_slurm_adopt. No, we haven't seen "Error obtaining local step information" at all (in all the logs I've seen pam_slurm_adopt reports that the adoption has succeeded). The NodeName is the same, but I've tried adding the option anyway; the result is still the same as in comment 22. (In reply to aseishas from comment #47) > No, we haven't seen "Error obtaining local step information" at all (in all > the logs I've seen pam_slurm_adopt reports that the adoption has succeeded). > The NodeName is the same, but I've tried adding the option anyway; the > result is still the same as in comment 22. Alright, I didn't think so but was just checking. Since the NodeName is the same as the hostname, you don't need it. Only if it's different. Well, I haven't succeeded in reproducing this yet (I've been testing on Ubuntu 16.04 with Slurm 17.11.2 and .7). I'll keep working on it. I've been able to reproduce (kind of) your situation by adding in these lines that you have: account [success=1 default=ignore] pam_listfile.so item=user sense=allow onerr=succeed file=/etc/security/rcadmins_root ... # Bypass the pam_slurm check for certain people. account [success=1 default=ignore] pam_listfile.so item=user sense=allow file=/etc/security/rcadmins_all onerr=fail But only for the users allowed in those files. For other users, pam_slurm_adopt work as expected. I suspect you are one of the users allowed through without pam_slurm_adopt? Can you check if a user not in one of those files is restricted as expected with pam_slurm_adopt? Normally I am listed in that file, but I remove myself for testing... pam_slurm_adopt is definitely running according to the logs, and even reports successful adoption. Well, it was a long shot. I saw your logs showing successful adopting, I just wasn't sure if something else was messed up or not. Can you set slurmddebug to debug2 on a node, reconfigure or restart slurmd on the node so the change is propagated, and ssh in as a user and cat /proc/self/cgroup as you've done before? Then can you upload the slurmd log file? I'm looking for certain log statements, including "_handle_add_extern_pid_internal". Assuming that's there and you still have the same problems, I'll give you a small patch to apply (a sleep) so you can verify that the process actually gets added to the cgroup. Then if we can verify that, we know that the process is being taken out again by something that isn't Slurm. Created attachment 7484 [details]
/var/log/slurm-llnl/slurmd
I've attached the slurred log; below are the results of `cat /proc/self/cgroup` and the auth.log record for the session. % cat /proc/self/cgroup 12:perf_event:/ 11:freezer:/ 10:pids:/system.slice/ssh.service 9:memory:/system.slice/ssh.service 8:cpuset:/ 7:net_cls,net_prio:/ 6:hugetlb:/ 5:devices:/system.slice/ssh.service 4:cpu,cpuacct:/system.slice/ssh.service 3:rdma:/ 2:blkio:/system.slice/ssh.service 1:name=systemd:/system.slice/ssh.service #/var/log/auth.log Aug 1 11:53:26 oat08 sshd[24083]: Authorized to epoch, krb5 principal epoch@stanford.edu (krb5_kuserok) Aug 1 11:53:26 oat08 sshd[24085]: pam_succeed_if(sshd:auth): requirement "uid eq 0" not met by user "epoch" Aug 1 11:53:29 oat08 sshd[24085]: Successful Duo login for 'epoch' from 171.64.18.17 Aug 1 11:53:29 oat08 sshd[24085]: pam_listfile(sshd:account): Refused user epoch for service sshd Aug 1 11:53:29 oat08 sshd[24085]: pam_listfile(sshd:account): Refused user epoch for service sshd Aug 1 11:53:29 oat08 pam_slurm_adopt[24085]: debug: Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf Aug 1 11:53:29 oat08 pam_slurm_adopt[24085]: debug: Reading slurm.conf file: /etc/slurm-llnl/slurm.conf Aug 1 11:53:29 oat08 pam_slurm_adopt[24085]: debug3: cpu_freq_verify_def: Performance set Aug 1 11:53:29 oat08 pam_slurm_adopt[24085]: debug4: found jobid = 1190329, stepid = 4294967295 Aug 1 11:53:29 oat08 pam_slurm_adopt[24085]: debug4: found jobid = 1190329, stepid = 0 Aug 1 11:53:29 oat08 pam_slurm_adopt[24085]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so Aug 1 11:53:29 oat08 pam_slurm_adopt[24085]: debug: Munge authentication plugin loaded Aug 1 11:53:29 oat08 pam_slurm_adopt[24085]: debug3: Success. Aug 1 11:53:29 oat08 pam_slurm_adopt[24085]: Connection by user epoch: user has only one job 1190329 Aug 1 11:53:29 oat08 pam_slurm_adopt[24085]: debug: _adopt_process: trying to get 1190329.4294967295 to adopt 24085 Aug 1 11:53:29 oat08 pam_slurm_adopt[24085]: debug: Leaving stepd_add_extern_pid Aug 1 11:53:29 oat08 pam_slurm_adopt[24085]: debug: Leaving stepd_get_x11_display Aug 1 11:53:29 oat08 pam_slurm_adopt[24085]: Process 24085 adopted into job 1190329 Aug 1 11:53:29 oat08 sshd[24083]: Accepted keyboard-interactive/pam for epoch from 171.64.18.17 port 64103 ssh2 Aug 1 11:53:29 oat08 sshd[24083]: pam_unix(sshd:session): session opened for user epoch by (uid=0) Aug 1 11:53:42 oat08 sshd[24113]: Received disconnect from 171.64.18.17 port 64103:11: disconnected by user Aug 1 11:53:42 oat08 sshd[24113]: Disconnected from 171.64.18.17 port 64103 Aug 1 11:53:42 oat08 sshd[24083]: pam_unix(sshd:session): session closed for user epoch Thanks, that's helpful. It looks like the process is getting added to the extern cgroups. This patch inserts a sleep right after the process gets added to the cgroups. It will stall the PAM stack from returning from pam_slurm_adopt for a minute to give you time to verify that the PID did get properly added to the cgroup. Can you have an ssh session ready on a node, then apply this patch, recompile, and run another job on that node? You shouldn't have to restart slurmd after applying the patch and recompiling, since this is executed by the stepd, so the new slurmstepd binary will be executed when you run the new job. Then go to /sys/fs/cgroup/freezer/slurm/uid_<your uid>/job_<your job id>/step_extern and run cat cgroup.procs and you'll see the PID of the extern sleep process. Then, SSH in, and it will stall for a minute. Then, run cat cgroup.procs again and you *should* see another PID. If that's there, you know that process was correctly adopted into the extern step cgroup, and that something else is removing it later. After a minute (you can reduce the sleep smaller than 60 seconds if you want), the function will return, and you'll be on the node. You can check /proc/self/cgroup again and also cgroup.procs in the extern step cgroup directory. diff --git a/src/slurmd/slurmstepd/req.c b/src/slurmd/slurmstepd/req.c index efd18f7647..100a3d9227 100644 --- a/src/slurmd/slurmstepd/req.c +++ b/src/slurmd/slurmstepd/req.c @@ -1333,6 +1333,8 @@ static int _handle_add_extern_pid_internal(stepd_step_rec_t *job, pid_t pid) return SLURM_FAILURE; } + sleep(60); + /* spawn a thread that will wait on the pid given */ slurm_thread_create_detached(NULL, _wait_extern_pid, extern_pid); Have you had a chance to try comment 57? Sorry for the delay... It definitely looks like something is removing it: % cat cgroup.procs 3195 % cat cgroup.procs 3195 3285 % cat cgroup.procs 3195 That's good to know that pam_slurm_adopt is working correctly. I can't find anything wrong in your PAM stack. Can you investigate the possibility that systemd is interfering somehow either in or out of your PAM stack and removing the PID from the cgroup after pam_slurm_adopt places the PID in the cgroup? I'm also looking at that, but I haven't found anything yet. Can you remove /lib*/security/pam_systemd.so - maybe just move it to a temporary directory? Then can you re-run the test? I don't think that pam_systemd.so is being used anywhere in your PAM stack, but this would be a definitive test. Do you think that systemd-logind might be causing the problem? Do you think you could disable it temporarily and test that out? (I'd try comment 62 first.) Just a ping - have you had a chance to try comment 62 (first) and then comment 63? I was finally able to test these... Unfortunately, neither moving pam_systemd.so or disabling systemd-logind made any difference. I have one more idea to try to eliminate any possibility that anything in the PAM stack is interfering. For a single test node, can you comment out all lines of the PAM stack except for pam_slurm_adopt and run your test again? If the session is correctly inside of the job extern cgroup, then we know something in the PAM stack is causing the problem, and you can begin uncommenting lines and see who the culprit is. If the session is still not inside of the job extern cgroup, then we know something else outside of the PAM stack is the problem (probably systemd). Just make sure you maintain your ssh session in the node so you don't get locked out of the node. I also fixed another bug in pam_slurm_adopt where action_adopt_failure wasn't working (patch under review). But that's unrelated to your bug. Have you had a chance to try the latest test? (Comment 69) I'm closing this as timedout for now. Please reopen when you have a chance to perform the test. I'm sorry again for the delay. Even stripping /etc/pam.d/sshd back to just a bare pam_slurm_adopt.so line doesn't seem to work; the session ends up in system.slice/ssh.service, as before. Can you upload the log files (slurmd, syslog and pam) for this latest attempt? I can't reproduce it, even on Ubuntu 16.04, slurm 17.11.02, using your slurm.conf, and with only pam_slurm_adopt.so in sshd. I've done some more careful testing, and I was wrong: reducing /etc/pam.d/sshd to just the pam_slurm_adopt module does substantially resolve the issue. It looks like only some resources are contained by Slurm, however (e.g., cpuset, but not memory): $ cat /proc/self/cgroup 12:hugetlb:/ 11:pids:/user.slice/user-23267.slice 10:perf_event:/ 9:cpuset:/slurm/uid_23267/job_1314815/step_extern 8:freezer:/slurm/uid_23267/job_1314815/step_extern 7:blkio:/user.slice 6:cpu,cpuacct:/user.slice 5:memory:/user.slice 4:devices:/user.slice 3:net_cls,net_prio:/ 2:rdma:/ 1:name=systemd:/user.slice/user-23267.slice/session-c2.scope Is that the expected behavior? See below for the /var/log/auth.log trace of this session, and I will attach a copy of slurmd.log. I'll continue looking for the particular module that causes the session to escape. Oct 23 11:58:19 oat07 sshd[20088]: Connection from 171.64.18.17 port 63131 on 171.67.51.17 port 22 Oct 23 11:58:19 oat07 sshd[20088]: Authorized to epoch, krb5 principal epoch@stanford.edu (krb5_kuserok) Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug: Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug: Reading slurm.conf file: /etc/slurm-llnl/slurm.conf Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug3: cpu_freq_verify_def: Performance set Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug4: found jobid = 1314815, stepid = 0 Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug4: found jobid = 1314815, stepid = 4294967295 Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug: Munge authentication plugin loaded Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug3: Success. Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: Connection by user epoch: user has only one job 1314815 Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug: _adopt_process: trying to get 1314815.4294967295 to adopt 20088 Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug: Leaving stepd_add_extern_pid Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: debug: Leaving stepd_get_x11_display Oct 23 11:58:19 oat07 pam_slurm_adopt[20088]: Process 20088 adopted into job 1314815 Oct 23 11:58:19 oat07 sshd[20088]: Accepted gssapi-with-mic for epoch from 171.64.18.17 port 63131 ssh2 Oct 23 11:58:19 oat07 sshd[20088]: pam_unix(sshd:session): session opened for user epoch by (uid=0) Oct 23 11:58:19 oat07 systemd-logind[2263]: New session c2 of user epoch. Oct 23 11:58:19 oat07 systemd: pam_unix(systemd-user:session): session opened for user epoch by (uid=0) Oct 23 11:58:19 oat07 sshd[20088]: User child is on pid 20103 Oct 23 11:58:19 oat07 sshd[20103]: Starting session: shell on pts/3 for epoch from 171.64.18.17 port 63131 id 0 Oct 23 11:58:42 oat07 systemd-logind[2263]: Removed session c2. Oct 23 11:58:42 oat07 systemd: pam_unix(systemd-user:session): session closed for user epoch Created attachment 8083 [details]
/var/log/slurm-llnl/slurmd
There was an thread on the Slurm-users list recently where someone had the same problem. I know you've removed all references to pam_systemd.so - but it looks like you may also need to mask and disable systemd-logind, which fixed that problem for them. systemctl stop systemd-logind systemctl mask systemd-logind Does this solve the issue? This is the slurm-users thread: https://groups.google.com/forum/#!topic/slurm-users/3vZUoZ_DXJI Stopping and masking systems-logind does ensure that the session remains fully contained. Thank you! I think we also have a theory as to why processes are not contained with our configuration in the first place, even though we have not been able to identify a particular culprit module. We use Duo for two-factor authentication, which requires keyboard-interactive auth to be enabled in the sshd configuration. When interactive authentication runs it looks like sshd spawns a subprocess for the purpose, and it's this subprocess that ends up contained by pam_slurm_adopt. The subprocess dies, and the user's shell ends up a child of the original, uncontained parent process, instead. If keyboard-interactive authentication is disabled, it looks like a single sshd process is responsible for the entire authentication flow... Without keyboard-interactive (one PID): Oct 24 13:10:10 oat07 sshd[17395]: Authorized to epoch, krb5 principal epoch@stanford.edu (krb5_kuserok) Oct 24 13:10:10 oat07 sshd[17395]: pam_listfile(sshd:account): Refused user epoch for service sshd Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug: Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug: Reading slurm.conf file: /etc/slurm-llnl/slurm.conf Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug3: cpu_freq_verify_def: Performance set Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug4: found jobid = 1315105, stepid = 0 Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug4: found jobid = 1315105, stepid = 4294967295 Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug: Munge authentication plugin loaded Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug3: Success. Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: Connection by user epoch: user has only one job 1315105 Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug: _adopt_process: trying to get 1315105.4294967295 to adopt 17395 Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug: Leaving stepd_add_extern_pid Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: debug: Leaving stepd_get_x11_display Oct 24 13:10:10 oat07 pam_slurm_adopt[17395]: Process 17395 adopted into job 1315105 Oct 24 13:10:10 oat07 sshd[17395]: Accepted gssapi-with-mic for epoch from 171.64.18.17 port 56511 ssh2 Oct 24 13:10:10 oat07 sshd[17395]: pam_unix(sshd:session): session opened for user epoch by (uid=0) Oct 24 13:10:10 oat07 sshd[17395]: pam_systemd(sshd:session): Failed to create session: Unit systemd-logind.service is masked. With keyboard-interactive (two PIDs): Oct 24 13:12:25 oat07 sshd[17654]: Authorized to epoch, krb5 principal epoch@stanford.edu (krb5_kuserok) Oct 24 13:12:25 oat07 sshd[17656]: pam_succeed_if(sshd:auth): requirement "uid eq 0" not met by user "epoch" Oct 24 13:12:27 oat07 sshd[17656]: Successful Duo login for 'epoch' from 171.64.18.17 Oct 24 13:12:27 oat07 sshd[17656]: pam_listfile(sshd:account): Refused user epoch for service sshd Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug: Reading cgroup.conf file /etc/slurm-llnl/cgroup.conf Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug: Reading slurm.conf file: /etc/slurm-llnl/slurm.conf Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug3: cpu_freq_verify_def: Performance set Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug4: found jobid = 1315105, stepid = 0 Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug4: found jobid = 1315105, stepid = 4294967295 Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug: Munge authentication plugin loaded Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug3: Success. Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: Connection by user epoch: user has only one job 1315105 Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug: _adopt_process: trying to get 1315105.4294967295 to adopt 17656 Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug: Leaving stepd_add_extern_pid Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: debug: Leaving stepd_get_x11_display Oct 24 13:12:27 oat07 pam_slurm_adopt[17656]: Process 17656 adopted into job 1315105 Oct 24 13:12:27 oat07 sshd[17654]: Accepted keyboard-interactive/pam for epoch from 171.64.18.17 port 56523 ssh2 Oct 24 13:12:27 oat07 sshd[17654]: pam_unix(sshd:session): session opened for user epoch by (uid=0) Oct 24 13:12:27 oat07 sshd[17654]: pam_systemd(sshd:session): Failed to create session: Unit systemd-logind.service is masked. That's a great catch. I think you're right about the child pid being adopted rather than the parent. I'll look into this more. Hello! I've been working with Adam on this (outside of the ticket), and I just thought: Would it be possible to make pam_slurm_adopt work as an account module and a session module, with the adoption code being moved to the session module? What I mean is, it seems to me that pam_slurm_adopt does two thing: Check if the user has a running job, and then adopt their SSH connection into the job's cgroup. If the "adopt their SSH connection into the job's cgroup" function is moved into a session call (instead of an account call), then I think there would be a couple of benefits: 1. It would avoid all of the child process craziness. At the end of comment 79, Adam's syslog output shows that the PAM session modules (search for "sshd:session") are all running in the parent process. So, doing the adoption as part of PAM session setup means the parent process gets put into the cgroup. 2. pam_systemd is also a session module. pam_slurm_adopt's session entry could be placed after pam_systemd, so that if pam_systemd messes with a user's cgroups, then pam_slurm_adopt could immediately go in and undo that. (Of course, if systemd-logind messes with a session's cgroups outside of pam_systemd, then we wouldn't be able to catch that, so #2 is not guaranteed!) Hi Karl, There's already a discussion about this going on in bug 5920 (and on the slurm-user's list). The issue is still with systemd - we recently ran into a situation with systemctl daemon-reload with Delegate=yes where systemd stole back the cgroups. We've fixed that one, but believe there are other cases in which systemd will steal back the cgroups. We're still discussing this internally. Another option that you have is to just adopt the parent as well as the child. My thought was to add in code to adopt the parent, as long as its pid is not 1 (the init process) and the UID of the parent matches the UID of the child. Unfortunately, there's no system call to do that, so it would involve adding in a bit more code to parse /proc/<pid>/status. I'm not sure that we'd add this in to the repo (we need to discuss it more), but it *should* (I think) work as a local patch. Something similar to this (WARNING - this compiles, but I haven't tested it, since I don't have something to replicate what you're seeing): $ git diff diff --git a/contribs/pam_slurm_adopt/pam_slurm_adopt.c b/contribs/pam_slurm_adopt/pam_slurm_adopt.c index 12cc4bd4b9..0d8099549b 100644 --- a/contribs/pam_slurm_adopt/pam_slurm_adopt.c +++ b/contribs/pam_slurm_adopt/pam_slurm_adopt.c @@ -117,7 +117,8 @@ static int _adopt_process(pam_handle_t *pamh, pid_t pid, step_loc_t *stepd) { int fd; uint16_t protocol_version; - int rc; + int rc, rc2; + pid_t ppid; if (!stepd) return -1; @@ -132,9 +133,12 @@ static int _adopt_process(pam_handle_t *pamh, pid_t pid, step_loc_t *stepd) return -1; } + ppid = getppid(); + rc = stepd_add_extern_pid(fd, stepd->protocol_version, pid); + rc2 = stepd_add_extern_pid(fd, stepd->protocol_version, ppid); - if ((rc == PAM_SUCCESS) && !opts.disable_x11) { + if ((rc == PAM_SUCCESS) && (rc2 == PAM_SUCCESS) && !opts.disable_x11) { int display; display = stepd_get_x11_display(fd, stepd->protocol_version); I suggest you find an appropriate workaround (either your option or adopting the parent pid option here) and do that locally. I'm not sure if the child process craziness is a bug fix or a feature request, so I need to explore that more myself. I'd like to move the discussion about pam_slurm_adopt in both account and session to bug 5920. Does that sound good to you? I am out of the office through Monday, November 5, and will respond to your message when I return. If you need immediate assistance please contact: srcc-support@stanford.edu -- Adam Seishas Research Computing Specialist Stanford | University IT aseishas@stanford.edu | 650.725.7490 Hi, I'd just like to follow up on my previous comment (comment 83). Also, we are looking into the possibility of not disabling pam_systemd at bug 5920. Thanks! We're following 5920 now. Sounds good. I'm closing this ticket as resolved/infogiven for now. If 5920 doesn't resolve your situation, then feel free to re-open this ticket or open a new one. I've filed another internal bug to (1) document the current need to mask and stop systemd-logind, and (2) further discuss the problems surrounding child processes during authentication. I'm not sure if that's unique to dual-factor authentication that uses interactive sessions or not. It's possible that we just document this problem for now until we find an appropriate solution. |