| Summary: | IntelMPI 2021 update 6 issues with slurm | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Erin Boland <erin.k.boland> |
| Component: | Other | Assignee: | Felip Moll <felip.moll> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 21.08.7 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: |
https://bugs.schedmd.com/show_bug.cgi?id=13495 https://bugs.schedmd.com/show_bug.cgi?id=16173 |
||
| Site: | Raytheon Missile, Space and Airborne | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | RHEL | Machine Name: | |
| CLE Version: | Version Fixed: | N/A | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
Erin Boland
2022-07-22 16:44:07 MDT
(In reply to Erin Boland from comment #0) > Using Intel MPI 2021 update 6 with slurm and seeing: > > In: PMI_Abort(2664079, Fatal error in PMPI_Init_thread: Other MPI error, > error stack: > 7 MPIR_Init_thread(138).........: > 6 MPID_Init(1117)...............: > 5 MPIDI_SHMI_mpi_init_hook(29)..: > 4 MPIDI_POSIX_mpi_init_hook(141): > 3 MPIDI_POSIX_eager_init(2268)..: > 2 MPIDU_shm_seg_commit(296).....: unable to allocate shared memory) Can you please try to set this environment variable before running the job? export I_MPI_PMI_LIBRARY=/path_to_slurm/lib/libpmi2.so and try again? Hi Felip, I had already had that environment variable set for this run. Erin W We got a fix - going to close the bug. (In reply to Erin Boland from comment #4) > We got a fix - going to close the bug. Hi Erin, Can you explain which was the fix exactly? That could be useful for the future. I am marking the bug as infogiven. If possible, I would appreciate some info about how you fixed the issue, that would be of great help for us and future responses/diagnostics. Thanks!! |