Ticket 10099 - sbatch/srun/salloc all fail with "Plugin loading failed due to missing symbols. Plugin is corrupted."
Summary: sbatch/srun/salloc all fail with "Plugin loading failed due to missing symbol...
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 20.11.x
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Danny Auble
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-10-29 15:00 MDT by Chris Samuel (NERSC)
Modified: 2020-10-29 15:36 MDT (History)
1 user (show)

See Also:
Site: NERSC
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 20.11.0-pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Chris Samuel (NERSC) 2020-10-29 15:00:34 MDT
Hi there,

Carrying on with trying to test 20.11 from git and I've run into a puzzle:

csamuel@gert01:/global/gscratch1/sd/csamuel/slurm/es-20.02> ./bin/sbatch --help
sbatch: fatal: plugin_load_and_link: Plugin loading failed due to missing symbols. Plugin is corrupted.

I thought I might have missed something in modifying our config to not reference anything outside of the install and perhaps it was picking up the wrong object file, but strace doesn't seem to show that:

csamuel@gert01:/global/gscratch1/sd/csamuel/slurm/es-20.02> strace -f -e openat ./bin/sbatch --help    
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/lib/slurm/tls/haswell/x86_64/libslurmfull.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/lib/slurm/tls/haswell/libslurmfull.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/lib/slurm/tls/x86_64/libslurmfull.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/lib/slurm/tls/libslurmfull.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/lib/slurm/haswell/x86_64/libslurmfull.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/lib/slurm/haswell/libslurmfull.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/lib/slurm/x86_64/libslurmfull.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/lib/slurm/libslurmfull.so", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/lib/slurm/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/lib/slurm/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/lib/slurm/libresolv.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib64/libresolv.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/lib/slurm/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/lib/slurm/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/etc/slurm.conf", O_RDONLY) = 3
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libnss_compat.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libnss_nis.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib64/libnsl.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libtirpc.so.3", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib64/libgssapi_krb5.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib64/libkrb5.so.3", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib64/libk5crypto.so.3", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libcom_err.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib64/libkrb5support.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib64/libkeyutils.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib64/libpcre.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/proc/filesystems", O_RDONLY) = 3
openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/etc/plugstack.conf", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/global/gscratch1/sd/csamuel/slurm/es-20.02/lib/slurm/cli_filter_user_defaults.so", O_RDONLY|O_CLOEXEC) = 3
sbatch: fatal: plugin_load_and_link: Plugin loading failed due to missing symbols. Plugin is corrupted.
+++ exited with 1 +++

This doesn't seem to impact squeue, scrontab, sinfo, sdiag, scontrol.

This is a pristine git tree of master and configured with:

configure --prefix=/global/gscratch1/sd/csamuel/slurm/es-20.02


Any ideas?

All the best,
Chris

All the best,
Chris
Comment 1 Danny Auble 2020-10-29 15:23:02 MDT
Sorry Chris, thanks for reporting.  Luckily it was only this one cli_filter plugin that we missed.  Good news is you found this before 20.11 was out the door ;).

This has already been fixed in commit 8b9d4311cf8b

Please reopen if you need anything else.
Comment 2 Chris Samuel (NERSC) 2020-10-29 15:27:12 MDT
Thanks Danny, I'll pull the fixes now and rebuild.
Comment 3 Chris Samuel (NERSC) 2020-10-29 15:36:15 MDT
Confirming that's working now.

csamuel@gert01:/global/gscratch1/sd/csamuel/slurm/es-20.02> ./bin/srun -q xfer -A nstaff hostname
gert02

Thanks Danny!