Created attachment 9383 [details] slurm.conf file Hi, We don't have a Cray XC type of system, and we have not explicitly configured it for a Cray and yet we are getting these errors: + srun --ntasks=1152 ./fv3.exe srun: error: plugin_load_from_file: dlopen(/apps/slurm/18.08.3/lib/slurm/select_cray.so): /apps/slurm/18.08.3/lib/slurm/select_cray.so: undefined symbol: post_job_step srun: error: Couldn't load specified plugin name for select/cray: Dlopen of plugin file failed srun: error: plugin_load_from_file: dlopen(/apps/slurm/18.08.3/lib/slurm/select_serial.so): /apps/slurm/18.08.3/lib/slurm/select_serial.so: undefined symbol: drain_nodes srun: error: Couldn't load specified plugin name for select/serial: Dlopen of plugin file failed srun: error: plugin_load_from_file: dlopen(/apps/slurm/18.08.3/lib/slurm/select_cons_res.so): /apps/slurm/18.08.3/lib/slurm/select_cons_res.so: undefined symbol: powercap_get_cluster_current_cap srun: error: Couldn't load specified plugin name for select/cons_res: Dlopen of plugin file failed srun: error: plugin_load_from_file: dlopen(/apps/slurm/18.08.3/lib/slurm/select_linear.so): /apps/slurm/18.08.3/lib/slurm/select_linear.so: undefined symbol: slurm_job_preempt_mode srun: error: Couldn't load specified plugin name for select/linear: Dlopen of plugin file failed srun: fatal: Can't find plugin for select/linear ++ date + echo 'Model ended: ' Fri Mar 1 21:52:12 GMT 2019 Model ended: Fri Mar 1 21:52:12 GMT 2019 + exit Is these some config flag missing? Attached is our slurm.conf It appears that this happens when users set: export LD_BIND_NOW=1 We were able to remove this line and errors have gone away before. But there are instances where users *need* to set this for their application to work, so we would like to know if these is way for this to work even when this environment variable is set. Thanks!
(In reply to Raghu Reddy from comment #0) > It appears that this happens when users set: > export LD_BIND_NOW=1 Slurm does not support when running with LD_BIND_NOW. Slurm uses a plugin architecture that is not compatible with LD_BIND_NOW. > We were able to remove this line and errors have gone away before. > > But there are instances where users *need* to set this for their application > to work, so we would like to know if these is way for this to work even when > this environment variable is set. Instead of setting LD_BIND_NOW in the job script, it can be added to the MPI job with something as simple as: > srun env LD_BIND_NOW=1 $MPIJOB
Raghu I'm going to close this bug, please reply if you have any more questions. --Nate