| Summary: | Set CUDA_DEVICE_ORDER when AutoDetect=nvml is used | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Michael Hinton <hinton> |
| Component: | GPU | Assignee: | Director of Support <support> |
| Status: | RESOLVED WONTFIX | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | kilian |
| Version: | 21.08.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: |
https://bugs.schedmd.com/show_bug.cgi?id=10827 https://bugs.schedmd.com/show_bug.cgi?id=10933 |
||
| Site: | SchedMD | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
Michael Hinton
2021-05-04 15:02:13 MDT
Hey Kilian, We are going to go ahead and leave CUDA_DEVICE_ORDER alone. How this is set probably won't matter in most cases, and in the cases where it could matter, we have this documented: "For this numbering to match the numbering reported by CUDA, the CUDA_DEVICE_ORDER environmental variable must be set to CUDA_DEVICE_ORDER=PCI_BUS_ID." The CUDA documentation also states that there are two possible values for CUDA_DEVICE_ORDER - FASTEST_FIRST and PCI_BUS_ID - and that the default is FASTEST_FIRST. See https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars. So we are going to err on the side of flexibility and backwards compatibility and leave it up to the CUDA application developer to change CUDA_DEVICE_ORDER. Of course, if you have a compelling counterpoint, feel free to elaborate. Thanks! -Michael |