Ticket 14600 - IntelMPI 2021 update 6 issues with slurm
Summary: IntelMPI 2021 update 6 issues with slurm
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 21.08.7
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Felip Moll
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-07-22 16:44 MDT by Erin Boland
Modified: 2023-03-03 10:43 MST (History)
0 users

See Also:
Site: Raytheon Missile, Space and Airborne
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: RHEL
Machine Name:
CLE Version:
Version Fixed: N/A
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Erin Boland 2022-07-22 16:44:07 MDT
Using Intel MPI 2021 update 6 with slurm and seeing: 

 In: PMI_Abort(2664079, Fatal error in PMPI_Init_thread: Other MPI error, error stack:
  7 MPIR_Init_thread(138).........:
  6 MPID_Init(1117)...............:
  5 MPIDI_SHMI_mpi_init_hook(29)..:
  4 MPIDI_POSIX_mpi_init_hook(141):
  3 MPIDI_POSIX_eager_init(2268)..:
  2 MPIDU_shm_seg_commit(296).....: unable to allocate shared memory)
Comment 1 Felip Moll 2022-07-24 01:27:07 MDT
(In reply to Erin Boland from comment #0)
> Using Intel MPI 2021 update 6 with slurm and seeing: 
> 
>  In: PMI_Abort(2664079, Fatal error in PMPI_Init_thread: Other MPI error,
> error stack:
>   7 MPIR_Init_thread(138).........:
>   6 MPID_Init(1117)...............:
>   5 MPIDI_SHMI_mpi_init_hook(29)..:
>   4 MPIDI_POSIX_mpi_init_hook(141):
>   3 MPIDI_POSIX_eager_init(2268)..:
>   2 MPIDU_shm_seg_commit(296).....: unable to allocate shared memory)

Can you please try to set this environment variable before running the job?

export I_MPI_PMI_LIBRARY=/path_to_slurm/lib/libpmi2.so

and try again?
Comment 2 Erin Boland 2022-07-25 11:17:39 MDT
Hi Felip,

I had already had that environment variable set for this run. 

Erin
Comment 3 Erin Boland 2022-07-26 09:47:27 MDT
W
Comment 4 Erin Boland 2022-07-26 09:47:39 MDT
We got a fix - going to close the bug.
Comment 5 Felip Moll 2022-07-26 10:34:44 MDT
(In reply to Erin Boland from comment #4)
> We got a fix - going to close the bug.

Hi Erin,

Can you explain which was the fix exactly? That could be useful for the future.
Comment 6 Felip Moll 2022-07-28 07:54:24 MDT
I am marking the bug as infogiven.

If possible, I would appreciate some info about how you fixed the issue, that would be of great help for us and future responses/diagnostics.

Thanks!!