Ticket 16522 - Can not load mpi/pmix_v4
Summary: Can not load mpi/pmix_v4
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: PMIx (show other tickets)
Version: 23.02.1
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-04-13 14:19 MDT by Jaime Freire
Modified: 2024-09-09 14:05 MDT (History)
2 users (show)

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: Ubuntu
Machine Name: cpu server. thinkmate brand
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Jaime Freire 2023-04-13 14:19:17 MDT
I am trying to install slurm 22.05.8 or 23.02.1 on CentOS7, but for some reason, PMIx is not being loaded:

From slurmctld.log:

2023-04-10T17:16:19.794] debug3: Trying to load plugin /usr/lib64/slurm/mpi_pmix_v4.so
[2023-04-10T17:16:19.795] debug3: plugin_load_from_file->_verify_syms: found Slurm plugin name:PMIx plugin type:mpi/pmix_v4 version:0x170201
[2023-04-10T17:16:19.795] error:  mpi/pmix_v4: init: (null) [0]: mpi_pmix.c:197: pmi/pmix: can not load PMIx library
[2023-04-10T17:16:19.795] error: Couldn't load specified plugin name for mpi/pmix_v4: Plugin init() callback failed
[2023-04-10T17:16:19.795] error: MPI: Cannot create context for mpi/pmix_v4

I have built both PMIx and Slurm:

wget https://github.com/openpmix/openpmix/releases/download/v4.1.2/pmix-4.1.2.tar.gz
tar -xzvf pmix-4.1.2.tar.gz
cd pmix-4.1.2
mkdir build
cd build
../configure --prefix /opt/pmix/pmix-4.1.2
make all
make install

wget https://download.schedmd.com/slurm/slurm-23.02.1.tar.bz2

rpmbuild -ta slurm-23.02.1.tar.bz2 --with mysql --with slurmrestd --with jwt --with lua --with hdf5 --with hwloc --with numa --define '_with_pmix --with-pmix=/opt/pmix/pmix-4.1.2'
Comment 1 Jaime Freire 2023-04-13 16:34:12 MDT
Additional information about the files location:

[ubuntu@juju-788468-0 slurm]$ pwd
/usr/lib64/slurm
[ubuntu@juju-788468-0 slurm]$ ls
accounting_storage_none.so         cli_filter_syslog.so         job_container_cncu.so            openapi_dbv0_0_38.so       select_cons_res.so
accounting_storage_slurmdbd.so     cli_filter_user_defaults.so  job_container_none.so            openapi_dbv0_0_39.so       select_cons_tres.so
acct_gather_energy_gpu.so          core_spec_cray_aries.so      job_container_tmpfs.so           openapi_v0_0_37.so         select_cray_aries.so
acct_gather_energy_ibmaem.so       core_spec_none.so            job_submit_all_partitions.so     openapi_v0_0_38.so         select_linear.so
acct_gather_energy_ipmi.so         cred_munge.so                job_submit_cray_aries.so         openapi_v0_0_39.so         serializer_json.so
acct_gather_energy_none.so         data_parser_v0_0_39.so       job_submit_lua.so                power_cray_aries.so        serializer_url_encoded.so
acct_gather_energy_pm_counters.so  ext_sensors_none.so          job_submit_require_timelimit.so  power_none.so              serializer_yaml.so
acct_gather_energy_rapl.so         ext_sensors_rrd.so           job_submit_throttle.so           preempt_none.so            site_factor_none.so
acct_gather_energy_xcc.so          gpu_generic.so               libslurmfull.so                  preempt_partition_prio.so  src
acct_gather_filesystem_lustre.so   gres_gpu.so                  libslurm_pmi.so                  preempt_qos.so             switch_cray_aries.so
acct_gather_filesystem_none.so     gres_mps.so                  mcs_account.so                   prep_script.so             switch_none.so
acct_gather_interconnect_none.so   gres_nic.so                  mcs_group.so                     priority_basic.so          task_affinity.so
acct_gather_interconnect_sysfs.so  gres_shard.so                mcs_none.so                      priority_multifactor.so    task_cgroup.so
acct_gather_profile_hdf5.so        hash_k12.so                  mcs_user.so                      proctrack_cgroup.so        task_cray_aries.so
acct_gather_profile_influxdb.so    jobacct_gather_cgroup.so     mpi_cray_shasta.so               proctrack_cray_aries.so    task_none.so
acct_gather_profile_none.so        jobacct_gather_linux.so      mpi_none.so                      proctrack_linuxproc.so     topology_3d_torus.so
auth_jwt.so                        jobacct_gather_none.so       mpi_pmi2.so                      proctrack_pgid.so          topology_hypercube.so
auth_munge.so                      jobcomp_elasticsearch.so     mpi_pmix.so                      rest_auth_jwt.so           topology_none.so
burst_buffer_datawarp.so           jobcomp_filetxt.so           mpi_pmix_v4.so                   rest_auth_local.so         topology_tree.so
burst_buffer_lua.so                jobcomp_lua.so               node_features_helpers.so         route_default.so
cgroup_v1.so                       jobcomp_mysql.so             node_features_knl_cray.so        route_topology.so
cli_filter_lua.so                  jobcomp_none.so              node_features_knl_generic.so     sched_backfill.so
cli_filter_none.so                 jobcomp_script.so            openapi_dbv0_0_37.so             sched_builtin.so
Comment 2 Slurm-Ninja 2024-01-26 15:37:47 MST
Hi @jaime,
I am having similar problems when integrating PMIX with slurm-23.11.3

I tried compiling both pmix:

3.2.2
4.2.8

B oth pmix compile fine, but when when I try to integrate slurm via rpmmacross the code for pmix library is not showing under the libpmi rpm:

The following is the command I am using for the RPM creation:

rpmbuild --define '--with-pmix=/SOFTWARE/MPIx/openpmix-3.2.2_master/lib:/SOFTWARE/MPIx/openpmix-4.2.8_master/lib' -tb slurm-23.11.3.tar.bz2  --with mysql --with ofed 2>&1 | tee build.log

The following is the pmi libraries I observed inside the generated RPM, PMIX is nowhere to be found:

rpm -qpl /root/rpmbuild/RPMS/x86_64/slurm-libpmi-23.11.3-1.el8.x86_64.rpm
/usr/lib/.build-id
/usr/lib/.build-id/e1/e445b724c722b88e13f39fcb27b7d30d32b8e1
/usr/lib/.build-id/ea/01a8978a2b93754a7e71cc9048b75b71084513
/usr/lib64/libpmi.so
/usr/lib64/libpmi.so.0
/usr/lib64/libpmi.so.0.0.0
/usr/lib64/libpmi2.so
/usr/lib64/libpmi2.so.0
/usr/lib64/libpmi2.so.0.0.0
Comment 3 Tolu 2024-09-09 11:55:10 MDT
I am also having issues with pmix. it couldn't load specified plugin name for mpi/pmix_v4. Please see the output of systemctl status slumrd:

slurmd: slurmd version 23.11.6 started
Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: error:  mpi/pmix_v4: init: (null) [0]: mpi_pmix.c:193: pmi/pmix: can not load PMIx library
Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: error: Couldn't load specified plugin name for mpi/pmix_v4: Plugin init() callback failed
Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: error: MPI: Cannot create context for mpi/pmix_v4
Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: error:  mpi/pmix_v4: init: (null) [0]: mpi_pmix.c:193: pmi/pmix: can not load PMIx library
Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: error: Couldn't load specified plugin name for mpi/pmix: Plugin init() callback failed
Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: error: MPI: Cannot create context for mpi/pmix
Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: slurmd started on Mon, 09 Sep 2024 12:18:00 -0400
Sep 09 12:18:00 c-22 systemd[1]: Started Slurm node daemon.
Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: CPUs=128 Boards=1 Sockets=2 Cores=32 Threads=2 Memory=515614 TmpDisk=374387 Uptime=780974 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)