Ticket 10509 - Avoid "InputOutput Error" in slurmd.log
Summary: Avoid "InputOutput Error" in slurmd.log
Status: RESOLVED DUPLICATE of ticket 9395
Alias: None
Product: Slurm
Classification: Unclassified
Component: PMIx (show other tickets)
Version: - Unsupported Older Versions
Hardware: Linux Linux
: C - Contributions
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-12-25 02:55 MST by menglong
Modified: 2020-12-27 15:47 MST (History)
2 users (show)

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description menglong 2020-12-25 02:55:01 MST
Hi,
 In our cluster, we have to set the env UCX_IB_ADDR_TYPE=ib_global to to make the job run normally and avoid the error "InputOutput Error" in slurmd.log. 
So we add the code in pmixp_info.c  like this:

p = getenvp(*env, "UCX_TLS");
	if (p) {
		setenv("UCX_TLS", p, 1);
	}
-- add --
p = getenvp(*env, "UCX_IB_ADDR_TYPE");
	if (p) {
		setenv("UCX_IB_ADDR_TYPE", p, 1);
	}
Comment 2 Artem Polyakov 2020-12-26 11:14:13 MST
I think this is related to the discussion in 
https://bugs.schedmd.com/show_bug.cgi?id=9395

In my opinion, we need a way to configure PMIx plugin from the slurm.conf instead of relying on environment variables.
In particular, I was thinking of having a slurm config option like
pmix_ucx_env="UCX_ENV1=val1,UCX_ENV2=val2,...."
Comment 3 Tim Wickberg 2020-12-27 15:47:45 MST
Hi -

While this may be an appropriate variable for your system, it's not necessarily valid on all installs. As Artem has noted, bug 9395 has a discussion of adding a mechanism to set arbitrary variables such as this, and I'm thus closing this as a duplicate of that ticket.

- Tim

*** This ticket has been marked as a duplicate of ticket 9395 ***