Ticket 11973 - Building Slurm RPMs with PMIx fails
Summary: Building Slurm RPMs with PMIx fails
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Build System and Packaging (show other tickets)
Version: 20.11.7
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Oriol Vilarrubi
QA Contact:
URL:
: 9293 (view as ticket list)
Depends on:
Blocks:
 
Reported: 2021-07-06 04:26 MDT by VUB HPC
Modified: 2021-09-02 05:57 MDT (History)
3 users (show)

See Also:
Site: VUB
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: CentOS
Machine Name:
CLE Version:
Version Fixed: 21.08.1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description VUB HPC 2021-07-06 04:26:44 MDT
We are having trouble generating Slurm RPMs supporting PMIx. Our goal is to build Slurm with PMIx from libpmix in the system and obtain RPMs for Slurm that automatically require libpmix.

Steps to reproduce:

1. rpmbuild -ba slurm.spec

Actual Results:

By default, building slurm.spec correctly enables PMIx, the configure script detects libpmix in the system and mpi_pmix.so and mpi_pmix_v3.so are built. However, the RPMS lack the "Requires" on libpmix. This happens because

1. the mpi/pmix plugin uses dlopen on libpmix instead of directly linking to it, which is not detected by rpmbuild
2. the BuildRequires and Requires in slurm.spec are conditioned to

   %if %{with pmix} && "%{_with_pmix}" == "--with-pmix"

   and hence not enabled by default

Additional Information: 

Alternatively, explicitly setting "--with pmix" in the rpmbuild command does not work and the mpi/pmix plugin is not built. This option adds "--with-pmix" to the configure command, but "--with-pmix" requires a path to work. See snipped from configure

+ ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc/slurm --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-really-no-cray --with-pmix --with-ucx
[...]
checking for pmix installation...
configure: WARNING: unable to locate pmix installation

We are currently working around this issue by

1. defining the "%_with_pmix" macro as: --define "_with_pmix --with-pmix=/usr"
2. modifying slurm.spec to simplify the conditionals on PMIx "BuildRequires" and "Requires" to

   %if %{with pmix}

However, this solution is not ideal. We would like to avoid manually editing the spec file.
Comment 1 Oriol Vilarrubi 2021-08-17 07:53:18 MDT
Hello,

The autodetect not working is intended behavior as we think that is better to only autodetect the required things like munge and that if you want autodetect to happen with something optional like pmix (that maybe for some reason you have it installed in the compilation node but not on the compute nodes) you specify the --with pmix

As you said setting --with pmix did not worked either, but it was not because of the configure but because of the slurm.spec file.

As you can see here, the pmix installation gets detected byt the configure script:
[jvilarru@centos slurm-build]$ ../slurm-21.08.0-0rc2/configure --with-pmix 2>/dev/null | grep "pmix installation"
checking for pmix installation... /usr

The problem is that there was an error while generating the "Requires" tag for the slurmd package, that was caused due to some variable that was unescaped in the slurm.spec file.

I'm working on testing that the fix I made can pass all the testing, and when that happens I will tell you in which slurm version it will ship.
Comment 3 VUB HPC 2021-08-20 08:15:20 MDT
Thanks for the feedback. I just checked again and the execution of `configure --with-pmix` on our side does not work as described

$./configure --with-pmix
[...]
checking for pmix installation... 
configure: WARNING: unable to locate pmix installation
[...]

This is with version 20.11.8 (commit 15a9f49).

Looking at the configure script it actually seems that this is the intended behavior for this version though. I see a list of default paths for PMIx in the configure script
21686: _x_ac_pmix_dirs="/usr /usr/local"

However, those are only used for executions of `configure` without "--with-pmix". Using `configure --with-pmix` gets a "yes" passed down as argument of --with-pmix, which replaces the default paths in $_x_ac_pmix_dirs with the string "yes". Hence, the search for PMIx is carried out under the prefix "yes/" and fails.
Comment 4 Oriol Vilarrubi 2021-08-23 10:41:52 MDT
Hello,

You're completely right, I was checking with the 21.08rc2 version not with 20.11.

The configure script has been modified in regards to pmix in this commit https://github.com/SchedMD/slurm/commit/e8036c5adb0585b7af208d1e2eb3bcd9afc687f5

This commit went into the version 21.08.0 and hopefully, the fix for the requires in the .rpm will also get into 21.08 but I cannot tell you in which minor version.

I'll keep you updated.
Comment 5 Oriol Vilarrubi 2021-08-24 03:38:09 MDT
*** Ticket 9293 has been marked as a duplicate of this ticket. ***
Comment 8 Oriol Vilarrubi 2021-09-02 05:57:01 MDT
Hello,

The fix for the "Requires" tag in slurmd for pmix and ucx is present already in the Slurm master branch, and it will ship with Slurm 21.08.1 version.

So in version 21.08.01 you will find the problems you stated resolved, from the configure fix that comes in 21.08.0, commit e8036c5adb0585b7af208d1e2eb3bcd9afc687f5 and the fix for the Requires, coming from commit 5f58273fc5c9e8b30655837fe0493099628a5b4c.

I'll close this ticket as fixed, but if you find that the it does not work properly in your system, do not hesitate to re-open this bug by simply replying to this comment or creating a new one.