Ticket 3730

Summary: After upgrade to 17.02.2, spank-x11 plugin broken and nodes draining
Product: Slurm Reporter: HMS Research Computing <rc>
Component: slurmdAssignee: Director of Support <support>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 17.02.2   
Hardware: Linux   
OS: Linux   
Site: Harvard Medical School Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description HMS Research Computing 2017-04-23 08:15:27 MDT
Hi,

we recently upgraded to slurm 17.02.2 from 16.05.4
and for some reason x11 display isn't working

Previously we are able to run x11 using --x11 and --x11=batch

**This is the output of job**
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/usr/libexec/slurm-spank-x11 -u root -s "ssh" -o "" -f (null) -d localhost:11.0 -t compute-a-16-30.o2.rc.hms.harvard.edu -i 909796.4294967294 -cwg  &'
slurmstepd: error: x11: unable to get a DISPLAY value
slurmstepd: error: spank: required plugin x11.so: user_init() failed with rc=-6
slurmstepd: error: spank_user failed.
slurmstepd: error: Unable to return to working directory
slurmstepd: error: job_manager exiting abnormally, rc = 4020

The major issue is this is setting nodes to Drain state with reason "batch job complete failure"

**slurmd logs on one of the nodes**
[2017-04-21T11:35:55.137] [873337] error: x11: unable to get a DISPLAY value
[2017-04-21T11:35:55.138] [873337] error: spank: required plugin x11.so: user_init() failed with rc=-6
[2017-04-21T11:35:55.139] [873337] error: spank_user failed.
[2017-04-21T11:35:55.140] [873337] error: Unable to return to working directory
[2017-04-21T11:35:55.170] [873337] error: job_manager exiting abnormally, rc = 4020

We don't know why x11 plugin broken after upgrading slurm
Comment 3 Tim Shaw 2017-04-25 10:18:17 MDT
Hello,

Make sure you recompile the x11 SPANK plugin.  SPANK plugins using the Slurm APIs need to be recompiled when upgrading Slurm to a new major release.

https://slurm.schedmd.com/spank.html

Hope that helps.

Regards.

Tim
Comment 4 Tim Shaw 2017-05-01 16:13:14 MDT
Hello,

I just wanted to verify this recommendation fixed the problem you were seeing.  Is it okay to resolve this bug?

Thanks

Tim
Comment 5 HMS Research Computing 2017-05-02 12:38:00 MDT
Hi Tim,

yes recompiling the plug in fixed the problem, you can close this ticket.

Thanks of the help!

Raffaele
Comment 6 Tim Shaw 2017-05-02 12:41:12 MDT
Resolving this bug.