Summary: | Can not load mpi/pmix_v4 | ||
---|---|---|---|
Product: | Slurm | Reporter: | Jaime Freire <jaime> |
Component: | PMIx | Assignee: | Jacob Jenson <jacob> |
Status: | OPEN --- | QA Contact: | |
Severity: | 6 - No support contract | ||
Priority: | --- | CC: | Lisandro.Grullon, odupetolujohnson |
Version: | 23.02.1 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | -Other- | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Tzag Elita Sites: | --- |
Linux Distro: | Ubuntu | Machine Name: | cpu server. thinkmate brand |
CLE Version: | Version Fixed: | ||
Target Release: | --- | DevPrio: | --- |
Emory-Cloud Sites: | --- |
Description
Jaime Freire
2023-04-13 14:19:17 MDT
Additional information about the files location: [ubuntu@juju-788468-0 slurm]$ pwd /usr/lib64/slurm [ubuntu@juju-788468-0 slurm]$ ls accounting_storage_none.so cli_filter_syslog.so job_container_cncu.so openapi_dbv0_0_38.so select_cons_res.so accounting_storage_slurmdbd.so cli_filter_user_defaults.so job_container_none.so openapi_dbv0_0_39.so select_cons_tres.so acct_gather_energy_gpu.so core_spec_cray_aries.so job_container_tmpfs.so openapi_v0_0_37.so select_cray_aries.so acct_gather_energy_ibmaem.so core_spec_none.so job_submit_all_partitions.so openapi_v0_0_38.so select_linear.so acct_gather_energy_ipmi.so cred_munge.so job_submit_cray_aries.so openapi_v0_0_39.so serializer_json.so acct_gather_energy_none.so data_parser_v0_0_39.so job_submit_lua.so power_cray_aries.so serializer_url_encoded.so acct_gather_energy_pm_counters.so ext_sensors_none.so job_submit_require_timelimit.so power_none.so serializer_yaml.so acct_gather_energy_rapl.so ext_sensors_rrd.so job_submit_throttle.so preempt_none.so site_factor_none.so acct_gather_energy_xcc.so gpu_generic.so libslurmfull.so preempt_partition_prio.so src acct_gather_filesystem_lustre.so gres_gpu.so libslurm_pmi.so preempt_qos.so switch_cray_aries.so acct_gather_filesystem_none.so gres_mps.so mcs_account.so prep_script.so switch_none.so acct_gather_interconnect_none.so gres_nic.so mcs_group.so priority_basic.so task_affinity.so acct_gather_interconnect_sysfs.so gres_shard.so mcs_none.so priority_multifactor.so task_cgroup.so acct_gather_profile_hdf5.so hash_k12.so mcs_user.so proctrack_cgroup.so task_cray_aries.so acct_gather_profile_influxdb.so jobacct_gather_cgroup.so mpi_cray_shasta.so proctrack_cray_aries.so task_none.so acct_gather_profile_none.so jobacct_gather_linux.so mpi_none.so proctrack_linuxproc.so topology_3d_torus.so auth_jwt.so jobacct_gather_none.so mpi_pmi2.so proctrack_pgid.so topology_hypercube.so auth_munge.so jobcomp_elasticsearch.so mpi_pmix.so rest_auth_jwt.so topology_none.so burst_buffer_datawarp.so jobcomp_filetxt.so mpi_pmix_v4.so rest_auth_local.so topology_tree.so burst_buffer_lua.so jobcomp_lua.so node_features_helpers.so route_default.so cgroup_v1.so jobcomp_mysql.so node_features_knl_cray.so route_topology.so cli_filter_lua.so jobcomp_none.so node_features_knl_generic.so sched_backfill.so cli_filter_none.so jobcomp_script.so openapi_dbv0_0_37.so sched_builtin.so Hi @jaime, I am having similar problems when integrating PMIX with slurm-23.11.3 I tried compiling both pmix: 3.2.2 4.2.8 B oth pmix compile fine, but when when I try to integrate slurm via rpmmacross the code for pmix library is not showing under the libpmi rpm: The following is the command I am using for the RPM creation: rpmbuild --define '--with-pmix=/SOFTWARE/MPIx/openpmix-3.2.2_master/lib:/SOFTWARE/MPIx/openpmix-4.2.8_master/lib' -tb slurm-23.11.3.tar.bz2 --with mysql --with ofed 2>&1 | tee build.log The following is the pmi libraries I observed inside the generated RPM, PMIX is nowhere to be found: rpm -qpl /root/rpmbuild/RPMS/x86_64/slurm-libpmi-23.11.3-1.el8.x86_64.rpm /usr/lib/.build-id /usr/lib/.build-id/e1/e445b724c722b88e13f39fcb27b7d30d32b8e1 /usr/lib/.build-id/ea/01a8978a2b93754a7e71cc9048b75b71084513 /usr/lib64/libpmi.so /usr/lib64/libpmi.so.0 /usr/lib64/libpmi.so.0.0.0 /usr/lib64/libpmi2.so /usr/lib64/libpmi2.so.0 /usr/lib64/libpmi2.so.0.0.0 I am also having issues with pmix. it couldn't load specified plugin name for mpi/pmix_v4. Please see the output of systemctl status slumrd: slurmd: slurmd version 23.11.6 started Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: error: mpi/pmix_v4: init: (null) [0]: mpi_pmix.c:193: pmi/pmix: can not load PMIx library Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: error: Couldn't load specified plugin name for mpi/pmix_v4: Plugin init() callback failed Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: error: MPI: Cannot create context for mpi/pmix_v4 Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: error: mpi/pmix_v4: init: (null) [0]: mpi_pmix.c:193: pmi/pmix: can not load PMIx library Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: error: Couldn't load specified plugin name for mpi/pmix: Plugin init() callback failed Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: error: MPI: Cannot create context for mpi/pmix Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: slurmd started on Mon, 09 Sep 2024 12:18:00 -0400 Sep 09 12:18:00 c-22 systemd[1]: Started Slurm node daemon. Sep 09 12:18:00 c-22 slurmd[151268]: slurmd: CPUs=128 Boards=1 Sockets=2 Cores=32 Threads=2 Memory=515614 TmpDisk=374387 Uptime=780974 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null) |