Ticket 8625

Summary: Compiling Slurm with PMIX support
Product: Slurm Reporter: Alex Mamach <alex.mamach>
Component: PMIxAssignee: Felip Moll <felip.moll>
Status: RESOLVED INFOGIVEN QA Contact: Unassigned Reviewer <reviewers>
Severity: 4 - Minor Issue    
Priority: ---    
Version: 20.02.0   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=6598
https://bugs.schedmd.com/show_bug.cgi?id=5323
Site: Northwestern Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: RHEL
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: workaround_with_pmix_2002.patch

Description Alex Mamach 2020-03-04 15:42:01 MST
I've been struggling to build Slurm RPMs with support for multiple PMIx versions, I suspect I'm doing something wrong but was hoping for some guidance. I've compiled and installed both pmix-2.2.3 and pmix-3.1.5, installing them into /usr/local/pmix-2.2.3 and /usr/local/pmix-3.1.5 respectively.

At first I tried to build the Slurm RPMs this via an .rpmmacros file like this:

%_with_ucx /usr/local/ucx-1.7.0
%_with_pmix /usr/local/pmix-3.1.5
%_with_pmix /usr/local/pmix-2.2.3

However I received an error with this output:

+ ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc/slurm --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info /usr/local/pmix-3.1.5 /usr/local/ucx-1.7.0
configure: WARNING: you should use --build, --host, --target
configure: WARNING: invalid host type: /usr/local/pmix-3.1.5
configure: WARNING: you should use --build, --host, --target
configure: WARNING: invalid host type: /usr/local/ucx-1.7.0
checking build system type... x86_64-redhat-linux-gnu
checking host system type... x86_64-redhat-linux-gnu
checking target system type... /usr/local/pmix-3.1.5
configure: error: invalid value of canonical target
error: Bad exit status from /var/tmp/rpm-tmp.r6Buuw (%build)


RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.r6Buuw (%build)


Issue appears to be that rpmmacros variables are being treated as --includedir values for some reason.

I also tried this method:

rpmbuild --define '_with_ucx --with-ucx=/usr/local/ucx-1.7.0' --define '_with_pmix --with-pmix=/usr/local/pmix-2.2.3:/usr/local/pmix-3.1.5' -tb slurm-20.02.0.tar.bz2

Which builds the RPMs successfully, but trying to install them causes a conflict with the pmix-2 RPM I already built and installed:

Transaction check error:
  file /usr/local/pmix-2.2.3/lib64/libpmi.so from install of slurm-libpmi-20.02.0-1.el7.x86_64 conflicts with file from package legacy-pmix-2.2.3-1.el7.x86_64
  file /usr/local/pmix-2.2.3/lib64/libpmi2.so from install of slurm-libpmi-20.02.0-1.el7.x86_64 conflicts with file from package legacy-pmix-2.2.3-1.el7.x86_64

Maybe I'm misunderstanding, but I thought pointing to the pmix2 location in the compilation would have prevented this error. Even stranger, if I then remove the pmix2 package and install Slurm, srun/sacct/etc are all installed in /usr/local/pmix-2.2.3/ instead of /usr/bin/.

Can you help me understand how I should be building these RPMs with PMIx support? I've read through the documentation and I don't see where I'm going wrong. Thanks!
Comment 1 Felip Moll 2020-03-05 10:30:48 MST
From what I see right now, support for rpmbuild with pmix is such that only takes into account the system installed version, and it seems you cannot use another version.

The code in the spec file was introduced in bug 6598, commit 35bb9afb.

I will do some tests and come back to you with the conclusions.
Comment 2 Felip Moll 2020-03-05 11:04:51 MST
Created attachment 13284 [details]
workaround_with_pmix_2002.patch

This is a quick workaround that seems to generate the correct config line.

I haven't tried to install the rpms nevertheless, I need more time to do so.

I am lowering the severity of this bug to sev-4 since this is doesn't have such impact. Please see https://www.schedmd.com/support.php for description of our sev levels.
Comment 3 Felip Moll 2020-03-05 11:06:14 MST
Forgot to add.. compile with:

rpmbuild --define "slurm_with_pmix /path/to/your/pmix_v1:/path/pmix_v2" ...

and note like the configure line looks like:

+ ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc/slurm --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-pmix=/path/to/your/pmix_v1:/path/pmix_v2
Comment 4 Felip Moll 2020-03-06 08:12:08 MST
I analyzed more the issues.

I think you are doing it correctly: there's obviously some problem when using the rpmmacros file, but not when using a define in the command line.

This works for me:

> rpmbuild --define '_with_ucx --with-ucx=/usr/local/ucx-1.7.0' --define
> '_with_pmix --with-pmix=/usr/local/pmix-2.2.3:/usr/local/pmix-3.1.5' -tb
> slurm-20.02.0.tar.bz2

In what regards to this:

> the compilation would have prevented this error. Even stranger, if I then
> remove the pmix2 package and install Slurm, srun/sacct/etc are all installed
> in /usr/local/pmix-2.2.3/ instead of /usr/bin/.

I am wondering if you're using an RPM compiled with something in rpmmacros. Can you remove your rpmmacros and show me the configure line that rpmbuild is showing up? I think you must have a prefix defined somewhere maybe from the time you built pmix-2.2.3.

This is my generated file:

]$ rpm -qpl rpmbuild/RPMS/x86_64/slurm-libpmi-20.02.0-1.fc30.x86_64.rpm
/usr/lib/.build-id
/usr/lib/.build-id/07/a2a7ae193db2cc26e94df57bddb22d9826bc6b
/usr/lib/.build-id/24/1a9880a469ad681191c6cc7c9d981a90fa66fd
/usr/lib64/libpmi.so
/usr/lib64/libpmi.so.0
/usr/lib64/libpmi.so.0.0.0
/usr/lib64/libpmi2.so
/usr/lib64/libpmi2.so.0
/usr/lib64/libpmi2.so.0.0.0

This is my configure line shown by rpmbuild:

+ ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc/slurm --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-pmix=/home/lipi/bin/pmix_v1:/home/lipi/bin/pmix

I am obsoleting the attached patch since it is not really needed.
Comment 8 Alex Mamach 2020-03-08 22:18:00 MDT
Hi, thanks for taking a look at this. After some more testing I can confirm you are correct, sorry for the dumb question and false alarm ;)
Comment 9 Alex Mamach 2020-03-08 22:18:23 MDT
Hi, thanks for taking a look at this. After some more testing I can confirm you are correct, sorry for the dumb question and false alarm ;)