Ticket 3129

Summary: Interactive jobs sometimes fail to start - openpty()
Product: Slurm Reporter: Ben Matthews <matthews>
Component: slurmstepdAssignee: Tim Wickberg <tim>
Status: RESOLVED INVALID QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: matthews
Version: 15.08.6   
Hardware: Linux   
OS: Linux   
Site: UCAR Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Ben Matthews 2016-09-28 16:21:46 MDT
This happens:

-bash-4.2$ salloc -N1 -c 1 -p sandybridge srun -N1 --pty --preserve-env $SHELL
salloc: Granted job allocation 119772
srun: error: task 0 launch failed: Slurmd could not connect IO
srun: error: task 1 launch failed: Slurmd could not connect IO
salloc: Relinquishing job allocation 119772

If I have an entry in /etc/group that is larger than ~400 bytes on RHEL 7.2

-bash-4.2$ rpm -qa | grep glibc-2 | grep x86_64
glibc-2.17-106.el7_2.4.x86_64
compat-glibc-2.12-4.el7.centos.x86_64

If I attach strace to slurmd (-f, so also slurmctld), we can somewhat see what's happening:

[pid 26158] stat("/dev/pts/1", {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
[pid 26158] getuid()                    = 24712
[pid 26158] open("/etc/group", O_RDONLY|O_CLOEXEC) = 16
[pid 26158] fstat(16, {st_mode=S_IFREG|0644, st_size=25951, ...}) = 0
[pid 26158] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aff576f2000
[pid 26158] read(16, "[redacted -- bits of /etc/group]"..., 4096) = 4096
[pid 26158] read(16, "[redacted -- bits of /etc/group]"..., 4096) = 4096
[pid 26158] close(16)                   = 0
[pid 26158] munmap(0x2aff576f2000, 4096) = 0
[pid 26158] getgid()                    = 0
[pid 26158] chown("/dev/pts/1", 24712, 0) = -1 EPERM (Operation not permitted)
[pid 26158] close(15)                   = 0


Our authentication system works by pulling information from ldap (among other places) and filling out /etc/passwd, /etc/group and /etc/shadow (and the slurm DB). Due to a similar bug we saw on another system, entries in /etc/group are split to be no longer than 512 bytes. Reducing this constant to 400 bytes, resolves the problem

[root@hl02 ~]# cp group.good /etc/group
cp: overwrite ‘/etc/group’? y

-bash-4.2$ salloc -N1 -c 1 -p sandybridge srun -N1 --pty --preserve-env $SHELL
salloc: Granted job allocation 119774
bash-4.2$

Groups are split as follows:

grep my_group /etc/group
mygroup:x:1000:list_of_users
mygroup-1:x:1000:more_users

which allows things like getent group my_group to return the full list and filesystem permissions to work, but doesn't seem to trip this bug. 

I did some digging in libc, and I suspect that the culprit is the call to __getgrnam_r() inside grantpt(), but I didn't isolate the problem to a particular line of code. 

If you do what some other tools do, and setup the pty directly instead of using openpty() the chown can be avoided (on modern Linux) and this problem doesn't happen (not sure if this is a good idea to change or not).
Comment 1 Ben Matthews 2016-09-28 16:26:54 MDT
I suppose this is also relevant.. on the compute node:

[root@hl02 ~]# mount | grep pts
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
Comment 2 Tim Wickberg 2017-11-13 10:43:26 MST
What OS is this?

It's not clear to me that this is our bug - if the openpty() call is failing that sounds like a libc issue...
Comment 3 Ben Matthews 2017-11-21 13:13:57 MST
I was playing with RHEL7.x at the time, but it's reproducible on pretty much anything glibc based - it's really probably not your bug. I opened the ticket in case you'd like to implement a workaround and so that the very non-obvious set of symptoms is on the record. 

I'll also note that SLURM seems to hit this with slightly smaller groups than most other things - not really sure why that is.
Comment 4 Tim Wickberg 2017-12-05 22:10:49 MST
Ben -

I'm cleaning up some stuff, and going to go ahead and mark this as resolved/invalid. I don't really want to go rebuild a broken syscall within Slurm just to dodge a glibc bug.

Feel free to reopen if you disagree, or propose a patch to replace openpty() with a Slurm-specific xopenpty() call.

- Tim