| Summary: | Missing libpmi.so.0 | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Levi Morrison <levi_morrison> |
| Component: | Other | Assignee: | Alejandro Sanchez <alex> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | 19.05.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | BYU - Brigham Young University | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Levi Morrison
2019-06-12 13:33:13 MDT
Hi Levi,
First of all, I want to remark that it is possible to execute MPI jobs with just PMIx (and without PMI/PMI2). I just tested this combination that works fine:
PMIx 3.1.2
$ /home/alex/repos/pmix/source/3.1.2/configure \
--prefix=/home/alex/repos/pmix/install/3.1.2
Slurm 19.05
$ /home/alex/slurm/source/configure \
--prefix=/home/alex/slurm/19.05/install \
--with-pmix=/home/alex/repos/pmix/install/3.1.2
OpenMPI v4.0.1
$ /home/alex/repos/ompi/source/v4.0.1/configure \
--prefix=/home/alex/repos/ompi/install/v4.0.1 \
--with-pmix=/home/alex/repos/pmix/install/3.1.2
alex@polaris:~/t$ srun --mpi=pmix -N2 -n2 --ntasks-per-node=2 mpi/mpi_hello
Hello world from process 1 of 2
Hello world from process 0 of 2
alex@polaris:~/t$
For the previous installation if I wanted to use --mpi=pmi2 instead of pmix, I'd get errors. Both Slurm and PMIx projects ship their own libpmi and libpmi2. Starting from 18.08 Slurm doesn't install libpmi nor libpmi2 by default. From the previous installation:
alex@polaris:~/slurm/19.05/install/lib$ ls -l
total 70992
-rw-r--r-- 1 alex alex 62102336 Jun 13 16:36 libslurm.a
-rwxr-xr-x 1 alex alex 987 Jun 13 16:36 libslurm.la
lrwxrwxrwx 1 alex alex 18 Jun 13 16:36 libslurm.so -> libslurm.so.34.0.0
lrwxrwxrwx 1 alex alex 18 Jun 13 16:36 libslurm.so.34 -> libslurm.so.34.0.0
-rwxr-xr-x 1 alex alex 10562200 Jun 13 16:36 libslurm.so.34.0.0
drwxr-xr-x 3 alex alex 20480 Jun 13 16:38 slurm
alex@polaris:~/slurm/19.05/install/lib$
(there's no libpmi nor libpmi2)
You can manually install libpmi or libpmi2 shipped with Slurm by going building the contribs/pmi and contribs/pmi2 respectively. Here's an example of installing libpmi2:
alex@polaris:~/slurm/19.05/build/contribs/pmi2$ make -j install
alex@polaris:~/slurm/19.05/install/lib$ ls -l
total 71688
-rw-r--r-- 1 alex alex 490536 Jun 13 18:17 libpmi2.a
-rwxr-xr-x 1 alex alex 961 Jun 13 18:17 libpmi2.la
lrwxrwxrwx 1 alex alex 16 Jun 13 18:17 libpmi2.so -> libpmi2.so.0.0.0
lrwxrwxrwx 1 alex alex 16 Jun 13 18:17 libpmi2.so.0 -> libpmi2.so.0.0.0
-rwxr-xr-x 1 alex alex 214400 Jun 13 18:17 libpmi2.so.0.0.0
-rw-r--r-- 1 alex alex 62102336 Jun 13 16:36 libslurm.a
-rwxr-xr-x 1 alex alex 987 Jun 13 16:36 libslurm.la
lrwxrwxrwx 1 alex alex 18 Jun 13 16:36 libslurm.so -> libslurm.so.34.0.0
lrwxrwxrwx 1 alex alex 18 Jun 13 16:36 libslurm.so.34 -> libslurm.so.34.0.0
-rwxr-xr-x 1 alex alex 10562200 Jun 13 16:36 libslurm.so.34.0.0
drwxr-xr-x 3 alex alex 20480 Jun 13 16:38 slurm
alex@polaris:~/slurm/19.05/install/lib$
Now you need to reconfigure OpenMPI and instruct the configure script to use the external (from OpenMPI perspective) version of pmi shipped with Slurm:
alex@polaris:~/repos/ompi/build/v4.0.1$ /home/alex/repos/ompi/source/v4.0.1/configure \
--prefix=/home/alex/repos/ompi/install/v4.0.1 \
--with-pmix=/home/alex/repos/pmix/install/3.1.2 \
--with-pmi=/home/alex/slurm/19.05/install <------------
if you inspect the config.log, you can now find stuff like this:
opal_pmi2_CPPFLAGS='-I/home/alex/slurm/19.05/install/include/slurm'
opal_pmi2_LDFLAGS='-L/home/alex/slurm/19.05/install/lib'
opal_pmi2_LIBS='-lpmi2'
...
opal_pmix_ext3x_CPPFLAGS='-I/home/alex/repos/pmix/install/3.1.2/include'
opal_pmix_ext3x_LDFLAGS='-L/home/alex/repos/pmix/install/3.1.2/lib'
opal_pmix_ext3x_LIBS='-lpmix'
Note also how ldd on OpenMPI lib/openmpi/mca_pmix_s2.so now reports the external libpmi2 shipped and installed in Slurm:
alex@polaris:~/repos/ompi/install/v4.0.1/lib/openmpi$ ldd mca_pmix_s2.so | grep pmi
libpmi2.so.0 => /home/alex/slurm/19.05/install/lib/libpmi2.so.0 (0x00007fd98dddf000)
alex@polaris:~/repos/ompi/install/v4.0.1/lib/openmpi$ ldd mca_pmix_ext3x.so | grep pmix
libpmix.so.2 => /home/alex/repos/pmix/install/3.1.2/lib/libpmix.so.2 (0x00007f8ec3941000)
alex@polaris:~/repos/ompi/install/v4.0.1/lib/openmpi$
If I now retry the same srun test --mpi=pmi2 it works:
alex@polaris:~/t$ srun --mpi=pmi2 -N2 -n2 --ntasks-per-node=2 mpi/mpi_hello
Hello world from process 1 of 2
Hello world from process 0 of 2
alex@polaris:~/t$
Please, let me know if this clarifies your doubts.
Thanks.
We intend to switch over to PMIx as we think multi-node MPI programs will work better this way, particularly from inside of containers. Our existing installation was not built with PMIx, so we need PMI2 to work for the migration. I will try building contribs/pmi2 and let you know how it goes. Thanks for the information. After building contribs/pmi and contribs/mpi2 we are able to run Open MPI and Intel MPI jobs using both mpirun and srun using PMI2. This issue looks resolved, thanks! (In reply to Levi Morrison from comment #2) > We intend to switch over to PMIx as we think multi-node MPI programs will > work better this way, particularly from inside of containers. Our existing > installation was not built with PMIx, so we need PMI2 to work for the > migration. I will try building contribs/pmi2 and let you know how it goes. > Thanks for the information. You can benefit from different features and performance improvement by switching to PMIx. You can find more information on our publications web resource: https://slurm.schedmd.com/publications.html (In reply to Levi Morrison from comment #3) > After building contribs/pmi and contribs/mpi2 we are able to run Open MPI > and Intel MPI jobs using both mpirun and srun using PMI2. This issue looks > resolved, thanks! Great. Thanks to you for reporting. |