Created attachment 7790 [details] Verbose error log for issue Hello, I'm having an issue setting up X11 forwarding. Everything was compiled with the necessary libssh2 and libssh2-devel libs, X11 forwarding enabled using the PrologFlags=x11 switch, and srun itself doesn't complain, yet any graphical programs throw an error about not being able to open the display. Digging further in the logs for the node, I found this (debug2 level logging file attached if you want to see the full output): > [2018-09-09T16:52:20.042] _run_prolog: run job script took usec=9 > [2018-09-09T16:52:20.042] _run_prolog: prolog with lock for job 2064 ran for 0 seconds > [2018-09-09T16:52:20.063] [2064.extern] error: _get_home: getpwuid_r(33560764):No error > [2018-09-09T16:52:20.063] [2064.extern] error: could not find HOME in environment > [2018-09-09T16:52:20.064] [2064.extern] error: x11 port forwarding setup failed > [2018-09-09T16:52:20.065] [2064.extern] error: _spawn_job_container: failed retrieving x11 display value: No error > [2018-09-09T16:52:20.095] [2064.extern] done with job > [2018-09-09T16:52:20.157] launch task 2064.0 request from 33560764.2019@198.38.16.106 (port 181) > [2018-09-09T16:52:20.159] error: could not get x11 forwarding display for job 2064 step 0, x11 forwarding disabled > [2018-09-09T16:52:20.216] [2064.0] done with job The part about not being able to find $HOME is strange, because the output of `srun -n1 -p compute2 -w borgw201 --x11 /usr/bin/env | grep HOME`, the same command that produced the error above, says $HOME is defined. One thing of note about my setup: Unlike in past errors, users are not allowed to ssh to compute nodes. All authentication is done with kerberos, so ssh keys don't exist. User home directories are on a shared NFS drive too. If this error is just a symptom of not having ssh key-based authentication and X11 forwarding isn't supported under any other setup, that would be nice to know for certain. I haven't had time yet, but will try the latest version of slurm (18.08.0) and close the issue if updating fixes it. If not, I will attach my slurm.conf next, and can provide any other files requested. Thanks, -Jack Duvall
Created attachment 7791 [details] slurm.conf
Update: 18.08.0 does not fix this issue for me. Nothing seemed to change drastically in the logs either. New abbreviated log: > [2018-09-09T18:40:31.983] _run_prolog: run job script took usec=169 > [2018-09-09T18:40:31.984] _run_prolog: prolog with lock for job 2067 ran for 0 seconds > [2018-09-09T18:40:32.046] [2067.extern] error: _get_home: getpwuid_r(33560764):No error > [2018-09-09T18:40:32.047] [2067.extern] error: could not find HOME in environment > [2018-09-09T18:40:32.047] [2067.extern] error: x11 port forwarding setup failed > [2018-09-09T18:40:32.065] [2067.extern] error: _spawn_job_container: failed retrieving x11 display value: No error > [2018-09-09T18:40:32.065] [2067.extern] error: _spawn_job_container: failed retrieving x11 authority value: No error > [2018-09-09T18:40:32.085] [2067.extern] done with job > [2018-09-09T18:40:32.102] launch task 2067.0 request from UID:33560764 GID:2019 HOST:198.38.16.106 PORT:17035 > [2018-09-09T18:40:32.104] error: could not get x11 forwarding display for job 2067 step 0, x11 forwarding disabled > [2018-09-09T18:40:33.843] [2067.0] done with job debug2 version of log attached.
Created attachment 7792 [details] Verbose slurmd.log, 18.08 version