Ticket 7560 - Enhance support for AMD GPUs and APIs
Summary: Enhance support for AMD GPUs and APIs
Status: RESOLVED DUPLICATE of ticket 7714
Alias: None
Product: Slurm
Classification: Unclassified
Component: GPU (show other tickets)
Version: 20.02.x
Hardware: Linux Linux
: 5 - Enhancement
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-08-12 22:17 MDT by Tim Wickberg
Modified: 2022-01-24 10:35 MST (History)
3 users (show)

See Also:
Site: CRAY
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: Nazare
Coreweave sites: ---
Cray Sites: Cray Internal
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: 20.11
DevPrio: 1 - Paid
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Tim Wickberg 2019-08-12 22:17:45 MDT
Add support for ROCR_VISIBLE_DEVICES environment variable manipulation, similar to that of CUDA_VISIBLE_DEVICES.

Add support equivalent to that of the Nvidia NVML / MPS libraries assuming sufficient API availability.
Comment 1 Tim Wickberg 2021-05-24 14:47:26 MDT
Just tidying up. I'm marking this as complete - the gpu/rsmi plugin has been available since the 20.02 release last year as is working as intended.

*** This ticket has been marked as a duplicate of ticket 7714 ***
Comment 2 Tim Wickberg 2022-01-24 10:35:40 MST
Opening this ticket up publicly, and adding a couple of documentation links:

AMD's ROCm SMI library is what the Slurm gpu/rsmi plugin depends on for device info:

https://github.com/RadeonOpenCompute/rocm_smi_lib

The rsmi.h header itself is the best description of the API they've defined:

https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/include/rocm_smi/rocm_smi.h