Summary: | X11 Forwarding Fails with FastX, succeeds with ssh -X | ||
---|---|---|---|
Product: | Slurm | Reporter: | Alex Mamach <alex.mamach> |
Component: | Configuration | Assignee: | Director of Support <support> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | ||
Version: | 18.08.5 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Northwestern | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | RHEL |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Attachments: | slurm.conf file |
Upon further investigation, I believe this may be due to lines 92-96 in x11_util.c: if (display[0] == ':') { error("Cannot forward to local display. " "Can only use X11 forwarding with network displays."); exit(-1); } If this is in fact the reason, is there any danger in us removing this check? I'm not entirely clear what it's attempting to protect or prevent, and maybe there's a better way for us to navigate than modifying the code. Thanks again! Hi Alex, This is a copy and past from https://bugs.schedmd.com/show_bug.cgi?id=6233 Our X11 forwarding implementation cannot connect to unix sockets at this time, this is something we may look at in a future release. Two options: - Use "ssh -X localhost", then run "srun --x11" within that SSH session. SSH itself will handle translation between a TCP socket that Slurm's implementation can use to the local unix socket. - Disable our build-in integration, and use the SPANK X11 plugin instead. Due to differences in how it forwards traffic, it can accommodate use of a unix socket instead of a network socket. We hope to address these limitations soon and we are actively looking into a possible solution for 19.05 and that work is being tracked through https://bugs.schedmd.com/show_bug.cgi?id=3647. -Jason Hi Alex, I am resolving this issue for now. The work that we are doing for X11 is targeted for 19.05 via the following issue. https://bugs.schedmd.com/show_bug.cgi?id=3647 Please consult the release notes in the upcoming 19.05 for the details once we have officially released. *** This ticket has been marked as a duplicate of ticket 3647 *** |
Created attachment 9194 [details] slurm.conf file Hi, We've been working on setting up X11 forwarding for our user applications. Our users typically connect to our cluster using either ssh -X, or, more frequently, a remote display application called FastX. When we attempt to use X11 forwarding in a job, (for example srun -A myaccount --time 4:00 --partition=mypartition --x11 xclock) after connecting with ssh -X, things work as expected. However, doing the same with FastX generates an error upon job submission: srun: error: Cannot forward to local display. Can only use X11 forwarding with network displays. After looking into some previous bug reports regarding X11 forwarding, I saw mention that Slurm looks at both the DISPLAY and HOSTNAME variables for the --x11 option. Interestingly, when connecting with ssh -X, DISPLAY will show a value such as localhost:11.0. However, when using FastX, DISPLAY instead has a value like :103, missing the localhost component of the variable. Interestingly, connecting with FastX allows me to run a GUI application on the login nodes, as well as any compute node I directly connect to with ssh -X, so the issue seems to be particular to Slurm's --x11 flag. Do you have any thoughts as to what might be going on here? I've attached our slurm.conf file in the event it proves helpful. Thank you! Alex