Ticket 10509

Summary: Avoid "InputOutput Error" in slurmd.log
Product: Slurm Reporter: menglong <meng_long_21>
Component: PMIxAssignee: Tim Wickberg <tim>
Status: RESOLVED DUPLICATE QA Contact:
Severity: C - Contributions    
Priority: --- CC: artpol84, felip.moll
Version: - Unsupported Older Versions   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description menglong 2020-12-25 02:55:01 MST
Hi,
 In our cluster, we have to set the env UCX_IB_ADDR_TYPE=ib_global to to make the job run normally and avoid the error "InputOutput Error" in slurmd.log. 
So we add the code in pmixp_info.c  like this:

p = getenvp(*env, "UCX_TLS");
	if (p) {
		setenv("UCX_TLS", p, 1);
	}
-- add --
p = getenvp(*env, "UCX_IB_ADDR_TYPE");
	if (p) {
		setenv("UCX_IB_ADDR_TYPE", p, 1);
	}
Comment 2 Artem Polyakov 2020-12-26 11:14:13 MST
I think this is related to the discussion in 
https://bugs.schedmd.com/show_bug.cgi?id=9395

In my opinion, we need a way to configure PMIx plugin from the slurm.conf instead of relying on environment variables.
In particular, I was thinking of having a slurm config option like
pmix_ucx_env="UCX_ENV1=val1,UCX_ENV2=val2,...."
Comment 3 Tim Wickberg 2020-12-27 15:47:45 MST
Hi -

While this may be an appropriate variable for your system, it's not necessarily valid on all installs. As Artem has noted, bug 9395 has a discussion of adding a mechanism to set arbitrary variables such as this, and I'm thus closing this as a duplicate of that ticket.

- Tim

*** This ticket has been marked as a duplicate of ticket 9395 ***