Summary: | X11 Forwarding: xauth timeout | ||
---|---|---|---|
Product: | Slurm | Reporter: | Ben Matthews <matthews> |
Component: | slurmstepd | Assignee: | Tim Wickberg <tim> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | CC: | alex, felip.moll, griznog, kaizaad |
Version: | 17.11.x | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | UCAR | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | 3 - High | Emory-Cloud Sites: | --- |
Description
Ben Matthews
2017-11-22 15:07:05 MST
I believe you know where to patch the timeout if required? I'm moving this into an enhancement request. I should probably have added an X11Parameters configuration option to give us a place to change these default values, but that will need to wait until 18.08 at this point. I found that this error is raised also when you have some stale locks on .Xauthority* files in your directory. The workaround to this is to remove the stale locks using xauth '-b' option, or to remove directly these files. [slurm@moll0 ~]$ ls .Xauthority* -lah -rw------- 1 slurm slurm 0 7 des 17:08 .Xauthority -rw------- 1 slurm slurm 0 7 des 17:10 .Xauthority-c -rw------- 1 slurm slurm 0 7 des 17:01 .Xauthority-l To detect if this is the problem an 'strace xauth' would show multiple EEXIST errors like that one: open("/nfs/home/slurm/.Xauthority-c", O_WRONLY|O_CREAT|O_EXCL, 0600) = -1 EEXIST (File exists) I'm seeing this today, although a week ago everything was working fine. [griznog@smsx10srw-srcf-d15-37 ~]$ srun --pty --x11 --time=1:00:00 xterm srun: error: run_command: xauth poll timeout @ 100 msec srun: error: x11_get_xauth: Could not retrieve magic cookie. Cannot use X11 forwarding. We have $HOME on GPFS so I tried increasing the timeout, but even at 10 seconds I still get the same error. There doesn't seem to be an issue with .Xauthority and I can 'ssh -Y' to a node and X forwarding back works normally. Any other suggestions on how to get this to work again? I'm on 17.11.02. Hey folks - I'm tagging this as a duplicate of the X11 catch-all bug 3647. As mentioned on there, 18.08 will have an X11Parameters option that gives us a place to add settings to change these timers. Also mentioned on there, I may add support for creating separate XAUTHORITY environment variables/files on the compute nodes, which should reduce contention on various filesystem for locking around ~/.Xauthority. *** This ticket has been marked as a duplicate of ticket 3647 *** This would be very helpful.
>
> Also mentioned on there, I may add support for creating separate XAUTHORITY
> environment variables/files on the compute nodes, which should reduce
> contention on various filesystem for locking around ~/.Xauthority.
>
> *** This bug has been marked as a duplicate of bug 3647 ***
|