| Summary: | After upgrade to 17.02.2, spank-x11 plugin broken and nodes draining | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | HMS Research Computing <rc> |
| Component: | slurmd | Assignee: | Director of Support <support> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | 17.02.2 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Harvard Medical School | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
Hello, Make sure you recompile the x11 SPANK plugin. SPANK plugins using the Slurm APIs need to be recompiled when upgrading Slurm to a new major release. https://slurm.schedmd.com/spank.html Hope that helps. Regards. Tim Hello, I just wanted to verify this recommendation fixed the problem you were seeing. Is it okay to resolve this bug? Thanks Tim Hi Tim, yes recompiling the plug in fixed the problem, you can close this ticket. Thanks of the help! Raffaele Resolving this bug. |
Hi, we recently upgraded to slurm 17.02.2 from 16.05.4 and for some reason x11 display isn't working Previously we are able to run x11 using --x11 and --x11=batch **This is the output of job** sh: -c: line 0: syntax error near unexpected token `(' sh: -c: line 0: `/usr/libexec/slurm-spank-x11 -u root -s "ssh" -o "" -f (null) -d localhost:11.0 -t compute-a-16-30.o2.rc.hms.harvard.edu -i 909796.4294967294 -cwg &' slurmstepd: error: x11: unable to get a DISPLAY value slurmstepd: error: spank: required plugin x11.so: user_init() failed with rc=-6 slurmstepd: error: spank_user failed. slurmstepd: error: Unable to return to working directory slurmstepd: error: job_manager exiting abnormally, rc = 4020 The major issue is this is setting nodes to Drain state with reason "batch job complete failure" **slurmd logs on one of the nodes** [2017-04-21T11:35:55.137] [873337] error: x11: unable to get a DISPLAY value [2017-04-21T11:35:55.138] [873337] error: spank: required plugin x11.so: user_init() failed with rc=-6 [2017-04-21T11:35:55.139] [873337] error: spank_user failed. [2017-04-21T11:35:55.140] [873337] error: Unable to return to working directory [2017-04-21T11:35:55.170] [873337] error: job_manager exiting abnormally, rc = 4020 We don't know why x11 plugin broken after upgrading slurm