Ticket 4269

Summary: Setting LD_BIND_NOW=1 results in undefined symbol in plugins
Product: Slurm Reporter: James Sharpe <james.sharpe>
Component: Build System and PackagingAssignee: Jacob Jenson <jacob>
Status: RESOLVED INVALID QA Contact:
Severity: 6 - No support contract    
Priority: --- CC: sts
Version: - Unsupported Older Versions   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=6619
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description James Sharpe 2017-10-17 04:38:59 MDT
This is really following on from an old mailing list post here: https://groups.google.com/d/topic/slurm-devel/eCrFsV60zQo/discussion
where a workaround for this issue was used but it doesn't resolve the underlying problem.

I've observed this on a slurm install at version 15.08 (this is a third party site that I have no control over). I don't currently have a newer slurm install to hand to check whether this is still an issue on current versions but will check when I have time.



Basically the issue is that the slurm plugins have unresolved symbols and so forcing them to be resolved at load time by setting LD_BIND_NOW=1 in the environment causes slurm jobs to fail (and hang until they timeout due to time limits, although this may also be due to a deadlock in the PMI implementation of Intel MPI)