| Summary: | Avoid "InputOutput Error" in slurmd.log | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | menglong <meng_long_21> |
| Component: | PMIx | Assignee: | Tim Wickberg <tim> |
| Status: | RESOLVED DUPLICATE | QA Contact: | |
| Severity: | C - Contributions | ||
| Priority: | --- | CC: | artpol84, felip.moll |
| Version: | - Unsupported Older Versions | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | -Other- | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
I think this is related to the discussion in https://bugs.schedmd.com/show_bug.cgi?id=9395 In my opinion, we need a way to configure PMIx plugin from the slurm.conf instead of relying on environment variables. In particular, I was thinking of having a slurm config option like pmix_ucx_env="UCX_ENV1=val1,UCX_ENV2=val2,...." Hi - While this may be an appropriate variable for your system, it's not necessarily valid on all installs. As Artem has noted, bug 9395 has a discussion of adding a mechanism to set arbitrary variables such as this, and I'm thus closing this as a duplicate of that ticket. - Tim *** This ticket has been marked as a duplicate of ticket 9395 *** |
Hi, In our cluster, we have to set the env UCX_IB_ADDR_TYPE=ib_global to to make the job run normally and avoid the error "InputOutput Error" in slurmd.log. So we add the code in pmixp_info.c like this: p = getenvp(*env, "UCX_TLS"); if (p) { setenv("UCX_TLS", p, 1); } -- add -- p = getenvp(*env, "UCX_IB_ADDR_TYPE"); if (p) { setenv("UCX_IB_ADDR_TYPE", p, 1); }