Created attachment 9194 [details] slurm.conf file Hi, We've been working on setting up X11 forwarding for our user applications. Our users typically connect to our cluster using either ssh -X, or, more frequently, a remote display application called FastX. When we attempt to use X11 forwarding in a job, (for example srun -A myaccount --time 4:00 --partition=mypartition --x11 xclock) after connecting with ssh -X, things work as expected. However, doing the same with FastX generates an error upon job submission: srun: error: Cannot forward to local display. Can only use X11 forwarding with network displays. After looking into some previous bug reports regarding X11 forwarding, I saw mention that Slurm looks at both the DISPLAY and HOSTNAME variables for the --x11 option. Interestingly, when connecting with ssh -X, DISPLAY will show a value such as localhost:11.0. However, when using FastX, DISPLAY instead has a value like :103, missing the localhost component of the variable. Interestingly, connecting with FastX allows me to run a GUI application on the login nodes, as well as any compute node I directly connect to with ssh -X, so the issue seems to be particular to Slurm's --x11 flag. Do you have any thoughts as to what might be going on here? I've attached our slurm.conf file in the event it proves helpful. Thank you! Alex
Upon further investigation, I believe this may be due to lines 92-96 in x11_util.c: if (display[0] == ':') { error("Cannot forward to local display. " "Can only use X11 forwarding with network displays."); exit(-1); } If this is in fact the reason, is there any danger in us removing this check? I'm not entirely clear what it's attempting to protect or prevent, and maybe there's a better way for us to navigate than modifying the code. Thanks again!
Hi Alex, This is a copy and past from https://bugs.schedmd.com/show_bug.cgi?id=6233 Our X11 forwarding implementation cannot connect to unix sockets at this time, this is something we may look at in a future release. Two options: - Use "ssh -X localhost", then run "srun --x11" within that SSH session. SSH itself will handle translation between a TCP socket that Slurm's implementation can use to the local unix socket. - Disable our build-in integration, and use the SPANK X11 plugin instead. Due to differences in how it forwards traffic, it can accommodate use of a unix socket instead of a network socket. We hope to address these limitations soon and we are actively looking into a possible solution for 19.05 and that work is being tracked through https://bugs.schedmd.com/show_bug.cgi?id=3647. -Jason
Hi Alex, I am resolving this issue for now. The work that we are doing for X11 is targeted for 19.05 via the following issue. https://bugs.schedmd.com/show_bug.cgi?id=3647 Please consult the release notes in the upcoming 19.05 for the details once we have officially released. *** This ticket has been marked as a duplicate of ticket 3647 ***