Ticket 8625 - Compiling Slurm with PMIX support
Summary: Compiling Slurm with PMIX support
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: PMIx (show other tickets)
Version: 20.02.0
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Felip Moll
QA Contact: Unassigned Reviewer
URL:
Depends on:
Blocks:
 
Reported: 2020-03-04 15:42 MST by Alex Mamach
Modified: 2020-03-08 22:18 MDT (History)
0 users

See Also:
Site: Northwestern
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: RHEL
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
workaround_with_pmix_2002.patch (1003 bytes, patch)
2020-03-05 11:04 MST, Felip Moll
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Alex Mamach 2020-03-04 15:42:01 MST
I've been struggling to build Slurm RPMs with support for multiple PMIx versions, I suspect I'm doing something wrong but was hoping for some guidance. I've compiled and installed both pmix-2.2.3 and pmix-3.1.5, installing them into /usr/local/pmix-2.2.3 and /usr/local/pmix-3.1.5 respectively.

At first I tried to build the Slurm RPMs this via an .rpmmacros file like this:

%_with_ucx /usr/local/ucx-1.7.0
%_with_pmix /usr/local/pmix-3.1.5
%_with_pmix /usr/local/pmix-2.2.3

However I received an error with this output:

+ ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc/slurm --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info /usr/local/pmix-3.1.5 /usr/local/ucx-1.7.0
configure: WARNING: you should use --build, --host, --target
configure: WARNING: invalid host type: /usr/local/pmix-3.1.5
configure: WARNING: you should use --build, --host, --target
configure: WARNING: invalid host type: /usr/local/ucx-1.7.0
checking build system type... x86_64-redhat-linux-gnu
checking host system type... x86_64-redhat-linux-gnu
checking target system type... /usr/local/pmix-3.1.5
configure: error: invalid value of canonical target
error: Bad exit status from /var/tmp/rpm-tmp.r6Buuw (%build)


RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.r6Buuw (%build)


Issue appears to be that rpmmacros variables are being treated as --includedir values for some reason.

I also tried this method:

rpmbuild --define '_with_ucx --with-ucx=/usr/local/ucx-1.7.0' --define '_with_pmix --with-pmix=/usr/local/pmix-2.2.3:/usr/local/pmix-3.1.5' -tb slurm-20.02.0.tar.bz2

Which builds the RPMs successfully, but trying to install them causes a conflict with the pmix-2 RPM I already built and installed:

Transaction check error:
  file /usr/local/pmix-2.2.3/lib64/libpmi.so from install of slurm-libpmi-20.02.0-1.el7.x86_64 conflicts with file from package legacy-pmix-2.2.3-1.el7.x86_64
  file /usr/local/pmix-2.2.3/lib64/libpmi2.so from install of slurm-libpmi-20.02.0-1.el7.x86_64 conflicts with file from package legacy-pmix-2.2.3-1.el7.x86_64

Maybe I'm misunderstanding, but I thought pointing to the pmix2 location in the compilation would have prevented this error. Even stranger, if I then remove the pmix2 package and install Slurm, srun/sacct/etc are all installed in /usr/local/pmix-2.2.3/ instead of /usr/bin/.

Can you help me understand how I should be building these RPMs with PMIx support? I've read through the documentation and I don't see where I'm going wrong. Thanks!
Comment 1 Felip Moll 2020-03-05 10:30:48 MST
From what I see right now, support for rpmbuild with pmix is such that only takes into account the system installed version, and it seems you cannot use another version.

The code in the spec file was introduced in bug 6598, commit 35bb9afb.

I will do some tests and come back to you with the conclusions.
Comment 2 Felip Moll 2020-03-05 11:04:51 MST
Created attachment 13284 [details]
workaround_with_pmix_2002.patch

This is a quick workaround that seems to generate the correct config line.

I haven't tried to install the rpms nevertheless, I need more time to do so.

I am lowering the severity of this bug to sev-4 since this is doesn't have such impact. Please see https://www.schedmd.com/support.php for description of our sev levels.
Comment 3 Felip Moll 2020-03-05 11:06:14 MST
Forgot to add.. compile with:

rpmbuild --define "slurm_with_pmix /path/to/your/pmix_v1:/path/pmix_v2" ...

and note like the configure line looks like:

+ ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc/slurm --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-pmix=/path/to/your/pmix_v1:/path/pmix_v2
Comment 4 Felip Moll 2020-03-06 08:12:08 MST
I analyzed more the issues.

I think you are doing it correctly: there's obviously some problem when using the rpmmacros file, but not when using a define in the command line.

This works for me:

> rpmbuild --define '_with_ucx --with-ucx=/usr/local/ucx-1.7.0' --define
> '_with_pmix --with-pmix=/usr/local/pmix-2.2.3:/usr/local/pmix-3.1.5' -tb
> slurm-20.02.0.tar.bz2

In what regards to this:

> the compilation would have prevented this error. Even stranger, if I then
> remove the pmix2 package and install Slurm, srun/sacct/etc are all installed
> in /usr/local/pmix-2.2.3/ instead of /usr/bin/.

I am wondering if you're using an RPM compiled with something in rpmmacros. Can you remove your rpmmacros and show me the configure line that rpmbuild is showing up? I think you must have a prefix defined somewhere maybe from the time you built pmix-2.2.3.

This is my generated file:

]$ rpm -qpl rpmbuild/RPMS/x86_64/slurm-libpmi-20.02.0-1.fc30.x86_64.rpm
/usr/lib/.build-id
/usr/lib/.build-id/07/a2a7ae193db2cc26e94df57bddb22d9826bc6b
/usr/lib/.build-id/24/1a9880a469ad681191c6cc7c9d981a90fa66fd
/usr/lib64/libpmi.so
/usr/lib64/libpmi.so.0
/usr/lib64/libpmi.so.0.0.0
/usr/lib64/libpmi2.so
/usr/lib64/libpmi2.so.0
/usr/lib64/libpmi2.so.0.0.0

This is my configure line shown by rpmbuild:

+ ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc/slurm --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-pmix=/home/lipi/bin/pmix_v1:/home/lipi/bin/pmix

I am obsoleting the attached patch since it is not really needed.
Comment 8 Alex Mamach 2020-03-08 22:18:00 MDT
Hi, thanks for taking a look at this. After some more testing I can confirm you are correct, sorry for the dumb question and false alarm ;)
Comment 9 Alex Mamach 2020-03-08 22:18:23 MDT
Hi, thanks for taking a look at this. After some more testing I can confirm you are correct, sorry for the dumb question and false alarm ;)