Summary: | When using pam_slurm_adopt in systemd, ssh is not containerized in any job container | ||
---|---|---|---|
Product: | Slurm | Reporter: | Felip Moll <felip.moll> |
Component: | Contributions | Assignee: | Director of Support <support> |
Status: | RESOLVED INFOGIVEN | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | CC: | alex |
Version: | 17.02.3 | ||
Hardware: | Linux | ||
OS: | Linux | ||
See Also: | https://bugs.schedmd.com/show_bug.cgi?id=5920 | ||
Site: | BSC-MN4 | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Tzag Elita Sites: | --- |
Linux Distro: | --- | Machine Name: | |
CLE Version: | Version Fixed: | ||
Target Release: | --- | DevPrio: | --- |
Emory-Cloud Sites: | --- | ||
Attachments: | systemd cgroup tree |
Description
Felip Moll
2017-06-21 02:03:13 MDT
Created attachment 4799 [details]
systemd cgroup tree
This is the relevant fragment when having one single job on the s01r1b02 node and at the same time accessing through ssh. As you can see access by pam_access is denied but then acces by pam_slurm_adopt is granted. 2017-06-21T10:28:40.724896+02:00 s01r1b02 sshd[153840]: pam_access(sshd:account): access denied for user `bsc99968' from `10.2.8.230' 2017-06-21T10:28:40.738637+02:00 s01r1b02 pam_slurm_adopt[153840]: Connection by user bsc99968: user has only one job 3510 2017-06-21T10:28:40.752534+02:00 s01r1b02 pam_slurm_adopt[153840]: Process 153840 adopted into job 3510 2017-06-21T10:28:40.752826+02:00 s01r1b02 sshd[153840]: Accepted publickey for bsc99968 from 10.2.8.230 port 44690 ssh2: DSA SHA256:km+Gtd3ncSq+4UO6Y9ifepPBKcDmqw66aISFC0nK6Kg 2017-06-21T10:28:40.754282+02:00 s01r1b02 sshd[153840]: pam_unix(sshd:session): session opened for user bsc99968 by (uid=0) 2017-06-21T10:28:40.763083+02:00 s01r1b02 systemd[1]: Created slice User Slice of bsc99968. 2017-06-21T10:28:40.765170+02:00 s01r1b02 systemd[1]: Starting User Manager for UID 1109... 2017-06-21T10:28:40.767303+02:00 s01r1b02 systemd-logind[1972]: New session 813 of user bsc99968. 2017-06-21T10:28:40.768470+02:00 s01r1b02 systemd[1]: Started Session 813 of user bsc99968. 2017-06-21T10:28:40.780091+02:00 s01r1b02 systemd: pam_unix(systemd-user:session): session opened for user bsc99968 by (uid=0) 2017-06-21T10:28:40.810746+02:00 s01r1b02 systemd[153845]: Reached target Timers. 2017-06-21T10:28:40.811001+02:00 s01r1b02 systemd[153845]: Reached target Sockets. 2017-06-21T10:28:40.811191+02:00 s01r1b02 systemd[153845]: Reached target Paths. 2017-06-21T10:28:40.811396+02:00 s01r1b02 systemd[153845]: Reached target Basic System. 2017-06-21T10:28:40.811600+02:00 s01r1b02 systemd[153845]: Reached target Default. 2017-06-21T10:28:40.811866+02:00 s01r1b02 systemd[153845]: Startup finished in 22ms. 2017-06-21T10:28:40.812062+02:00 s01r1b02 systemd[1]: Started User Manager for UID 1109. After this, we can see the cgroup tree (attached in file). Relevant lines are: Control group /: -.slice ├─system.slice ... │ ├─slurmd.service │ │ ├─ 2570 /usr/sbin/slurmd -M │ │ ├─153820 slurmstepd: [3510.4294967295 │ │ ├─153824 sleep 1000000 │ │ ├─153827 slurmstepd: [3510.0] │ │ └─153833 /usr/bin/sleep 3600 .... └─user.slice .... └─user-1109.slice ├─user@1109.service │ └─init.scope │ ├─153845 /usr/lib/systemd/systemd --user │ └─153851 (sd-pam) └─session-813.scope ├─153840 sshd: bsc99968 [priv ├─153854 sshd: bsc99968@pts/2 └─153855 -bash At this point we have pam.d configured this way: pam.d/sshd: #%PAM-1.0 auth requisite pam_nologin.so auth include common-auth account requisite pam_nologin.so account include common-account password include common-password account sufficient pam_access.so account required pam_slurm_adopt.so session required pam_loginuid.so session include common-session session optional pam_lastlog.so silent noupdate showfailed pam.d/common-session: #%PAM-1.0 # # This file is autogenerated by pam-config. All changes # will be overwritten. # # Session-related modules common to all services # # This file is included from other service-specific PAM config files, # and should contain a list of modules that define tasks to be performed # at the start and end of sessions of *any* kind (both interactive and # non-interactive # session required pam_limits.so session required pam_unix.so try_first_pass session optional pam_umask.so session optional pam_systemd.so session optional pam_env.so But we also tried commenting out pam_systemd.so in common-session and it also failed to be in job container: 2017-06-21T10:39:37.694830+02:00 s01r1b02 sshd[154135]: pam_access(sshd:account): access denied for user `bsc99968' from `10.2.8.230' 2017-06-21T10:39:37.709097+02:00 s01r1b02 pam_slurm_adopt[154135]: Connection by user bsc99968: user has only one job 3510 2017-06-21T10:39:37.724393+02:00 s01r1b02 pam_slurm_adopt[154135]: Process 154135 adopted into job 3510 2017-06-21T10:39:37.724622+02:00 s01r1b02 sshd[154135]: Accepted publickey for bsc99968 from 10.2.8.230 port 44962 ssh2: DSA SHA256:km+Gtd3ncSq+4UO6Y9ifepPBKcDmqw66aISFC0nK6Kg 2017-06-21T10:39:37.726256+02:00 s01r1b02 sshd[154135]: pam_unix(sshd:session): session opened for user bsc99968 by (uid=0) Control group /: -.slice ├─init.scope │ └─1 /sbin/init ├─system.slice ... │ ├─slurmd.service │ │ ├─ 2570 /usr/sbin/slurmd -M │ │ ├─153820 slurmstepd: [3510.4294967295] │ │ ├─153824 sleep 1000000 │ │ ├─153827 slurmstepd: [3510.0] │ │ └─153833 /usr/bin/sleep 3600 ... │ ├─sshd.service │ │ ├─ 2762 /usr/sbin/sshd -D │ │ ├─154135 sshd: bsc99968 [priv] │ │ ├─154140 sshd: bsc99968@pts/1 │ │ └─154141 -bash ... We've disabled pam_systemd from common-session which was in conflict with the pam_slurm_adopt module and now it works. Marking as resolved/infogiven. |