Ticket 14773

Summary: Documentation for mpi.conf incorrect PMIxTlsUCX options shown
Product: Slurm Reporter: mike coyne <mcoyne>
Component: PMIxAssignee: Felip Moll <felip.moll>
Status: RESOLVED FIXED QA Contact: Ben Roberts <ben>
Severity: 4 - Minor Issue    
Priority: --- CC: bsantos, sts
Version: 22.05.3   
Hardware: Linux   
OS: Linux   
Site: LANL Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: RHEL Machine Name: kit test clluster
CLE Version: Version Fixed: 22.05.4
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description mike coyne 2022-08-17 11:21:09 MDT
issue with setting for mpi.conf 

https://slurm.schedmd.com/mpi.conf.html

PMIxTlsUCX={true|false}
    Use TLS for UCX communication. Defaults to not being set. 

    Using either true or false will cause job submission failure on pmix jobs with ucx enabled
    It would appear that the true/false value that worked in 22.05.2 now should be set to a legal option for the ucx variable UCX_TLS=xxxx such as all which allows 22.05.3 to work .
Comment 1 Felip Moll 2022-08-18 04:40:07 MDT
(In reply to mike coyne from comment #0)
> issue with setting for mpi.conf 
> 
> https://slurm.schedmd.com/mpi.conf.html
> 
> PMIxTlsUCX={true|false}
>     Use TLS for UCX communication. Defaults to not being set. 
> 
>     Using either true or false will cause job submission failure on pmix
> jobs with ucx enabled
>     It would appear that the true/false value that worked in 22.05.2 now
> should be set to a legal option for the ucx variable UCX_TLS=xxxx such as
> all which allows 22.05.3 to work .

Hi Mike,

Can you show me the output of 'scontrol show config' ?

The PMIxTlsUCX documentation is wrong, it is not a boolean of true/false but a string which can be one of these according to the UCX documentation:

all - use all the available transports.
sm / shm - all shared memory transports.
mm - shared memory transports - only memory mappers.
ugni - ugni_rdma and ugni_udt.
rc - rc and ud.
rc_x - rc with accelerated verbs and ud.
ud_x - ud with accelerated verbs.

In fact my show config shows when I have no PMIxTlsUCX set shows the following, which is not set to true or to false:

MPI Plugins Configuration:
PMIxCliTmpDirBase       = (null)
PMIxCollFence           = (null)
PMIxDebug               = 0
PMIxDirectConn          = yes
PMIxDirectConnEarly     = no
PMIxDirectConnUCX       = no
PMIxDirectSameArch      = no
PMIxEnv                 = (null)
PMIxFenceBarrier        = no
PMIxNetDevicesUCX       = (null)
PMIxTimeout             = 300
PMIxTlsUCX              = (null)

What is your actual value? What is the combination of values that makes your jobs to fail? And what is the error seen?
I will correct our documentation about this parameter.
Comment 4 mike coyne 2022-08-18 07:41:32 MDT
(In reply to Felip Moll from comment #1)
> (In reply to mike coyne from comment #0)
> > issue with setting for mpi.conf 
> > 
> > https://slurm.schedmd.com/mpi.conf.html
> > 
> > PMIxTlsUCX={true|false}
> >     Use TLS for UCX communication. Defaults to not being set. 
> > 
> >     Using either true or false will cause job submission failure on pmix
> > jobs with ucx enabled
> >     It would appear that the true/false value that worked in 22.05.2 now
> > should be set to a legal option for the ucx variable UCX_TLS=xxxx such as
> > all which allows 22.05.3 to work .
> 
> Hi Mike,
> 
> Can you show me the output of 'scontrol show config' ?
> 
> The PMIxTlsUCX documentation is wrong, it is not a boolean of true/false but
> a string which can be one of these according to the UCX documentation:
> 
> all - use all the available transports.
> sm / shm - all shared memory transports.
> mm - shared memory transports - only memory mappers.
> ugni - ugni_rdma and ugni_udt.
> rc - rc and ud.
> rc_x - rc with accelerated verbs and ud.
> ud_x - ud with accelerated verbs.
> 
> In fact my show config shows when I have no PMIxTlsUCX set shows the
> following, which is not set to true or to false:
> 
> MPI Plugins Configuration:
> PMIxCliTmpDirBase       = (null)
> PMIxCollFence           = (null)
> PMIxDebug               = 0
> PMIxDirectConn          = yes
> PMIxDirectConnEarly     = no
> PMIxDirectConnUCX       = no
> PMIxDirectSameArch      = no
> PMIxEnv                 = (null)
> PMIxFenceBarrier        = no
> PMIxNetDevicesUCX       = (null)
> PMIxTimeout             = 300
> PMIxTlsUCX              = (null)
> 
> What is your actual value? What is the combination of values that makes your
> jobs to fail? And what is the error seen?
> I will correct our documentation about this parameter.
...
one additional question on mpi.conf on our cray XC's will need some env vars pushed
PMIxEnv=UCX_MEM_MALLOC_HOOKS=no,UCX_MEM_MALLOC_RELOC=no,UCX_MEM_EVENTS=no,UCX_UNIFIED_MODE=1
PMIxNetDevicesUCX=ipogif0
is this the correct syntax .. 
...
Current Configuration toss3(rhel7) x86_64  that works  ..  note scontrol show config does not seem to be showing the settings i tried this on both a compute node and front end , as a note this cluster kit has OPA 
fabric and is built using ucx 1.12.1

scontrol show config ..

MPI Plugins Configuration:
PMIxCliTmpDirBase       = (null)
PMIxCollFence           = (null)
PMIxDebug               = 0
PMIxDirectConn          = yes
PMIxDirectConnEarly     = no
PMIxDirectConnUCX       = no
PMIxDirectSameArch      = no
PMIxEnv                 = (null)
PMIxFenceBarrier        = no
PMIxNetDevicesUCX       = (null)
PMIxTimeout             = 300
PMIxTlsUCX              = (null)


-bash-4.2$ cat mpi.conf 
PMIxDebug=1
PMIxDirectConn=true
PMIxDirectConnEarly=true
PMIxDirectConnUCX=true
#PMIxEnv= 
PMIxNetDevicesUCX=hfi1_0:1
#PMIxTimeout=10
PMIxTlsUCX=all

with the setting as true >>>
[mcoyne@kit005 lib64]$ cat /etc/slurm/mpi.conf 
PMIxDebug=1
PMIxDirectConn=true
PMIxDirectConnEarly=true
PMIxDirectConnUCX=true
#PMIxEnv= 
PMIxNetDevicesUCX=hfi1_0:1
#PMIxTimeout=10
PMIxTlsUCX=true

[mcoyne@kit005 lib64]$ module load gcc openmpi
[mcoyne@kit005 lib64]$ srun -N2 /users/mcoyne/Wip/supermagic/buildomp4/supermagic
srun: launch/slurm: launch_p_step_launch: StepId=3102145.0 aborted before step completely launched.
srun: error: task 1 launch failed: Unspecified error
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: task 0 launch failed: Unspecified error


in the slurmd.log file
[2022-08-18T07:30:43.806] [3102145.0] cred/munge: init: Munge credential signature plugin loaded
[2022-08-18T07:30:43.808] [3102145.0] spank/lua: Loaded 2 plugins in this context
[2022-08-18T07:30:43.830] [3102145.0] error:  mpi/pmix_v3: pmixp_dconn_ucx_prepare: kit005 [0]: pmixp_dconn_ucx.c:248: Fail to init UCX: No such device
[2022-08-18T07:30:43.830] [3102145.0] error:  mpi/pmix_v3: pmixp_dconn_init: kit005 [0]: pmixp_dconn.c:74: Cannot get polling fd
[2022-08-18T07:30:43.830] [3102145.0] error:  mpi/pmix_v3: pmixp_stepd_init: kit005 [0]: pmixp_server.c:402: pmixp_dconn_init() failed
[2022-08-18T07:30:43.830] [3102145.0] error:  mpi/pmix_v3: mpi_p_slurmstepd_prefork: (null) [0]: mpi_pmix.c:224: pmixp_stepd_init() failed
[2022-08-18T07:30:43.833] [3102145.0] error: Failed mpi_g_slurmstepd_prefork
[2022-08-18T07:30:43.833] [3102145.0] Sent signal 9 to StepId=3102145.0
[2022-08-18T07:30:43.834] [3102145.0] Sent signal 9 to StepId=3102145.0


still does not show the change in configurations ( on compute node ) 

MPI Plugins Configuration:
PMIxCliTmpDirBase       = (null)
PMIxCollFence           = (null)
PMIxDebug               = 0
PMIxDirectConn          = yes
PMIxDirectConnEarly     = no
PMIxDirectConnUCX       = no
PMIxDirectSameArch      = no
PMIxEnv                 = (null)
PMIxFenceBarrier        = no
PMIxNetDevicesUCX       = (null)
PMIxTimeout             = 300
PMIxTlsUCX              = (null)
Comment 5 mike coyne 2022-08-18 07:46:45 MDT
should note i do not have a mpi.conf or a oci.conf on the master in the /etc/slurm configuration directory  Is this needed ? so scontrol show config works?
Comment 6 mike coyne 2022-08-18 07:49:50 MDT
(In reply to mike coyne from comment #5)
> should note i do not have a mpi.conf or a oci.conf on the master in the
> /etc/slurm configuration directory  Is this needed ? so scontrol show config
> works?


MPI Plugins Configuration:
PMIxCliTmpDirBase       = (null)
PMIxCollFence           = (null)
PMIxDebug               = 1
PMIxDirectConn          = yes
PMIxDirectConnEarly     = yes
PMIxDirectConnUCX       = yes
PMIxDirectSameArch      = no
PMIxEnv                 = (null)
PMIxFenceBarrier        = no
PMIxNetDevicesUCX       = hfi1_0:1
PMIxTimeout             = 300
PMIxTlsUCX              = all

Slurmctld(primary) at kit-master is UP

i put the mpi.conf file in the slurmctld's slurm conf directory and it does show up.
Comment 7 Felip Moll 2022-08-18 09:39:52 MDT
Mike:

>PMIxEnv=UCX_MEM_MALLOC_HOOKS=no,UCX_MEM_MALLOC_RELOC=no,UCX_MEM_EVENTS=no,UCX_UNIFIED_MODE=1

Your syntax is correct.

Thanks for the other information, it confirms what you already explained. Setting true or false is not correct for UCX_TLS (PMIxTlsUCX), the acceptable values are described in UCX documentation.

I uploaded a patch for review for changing the documentation. Please, set any of the correct one for your system: https://openucx.readthedocs.io/en/master/faq.html

About the 'scontrol show config', this only shows the config which is in the controller, so if you changed nodes locally it won't make a difference and you must check every mpi.conf manually.
Comment 11 Felip Moll 2022-08-22 09:29:01 MDT
Thanks for your comments Mike,

The docs are fixed in 22.05.4 commit a5e5b88ea and will be available in the webpage after next release is released.

Regards