Ticket 21741

Summary: Sanitise/canonicalise autodetected gpu names more throughly
Product: Slurm Reporter: Benjamin Smith <bsmith5>
Component: GPUAssignee: Tim Wickberg <tim>
Status: OPEN --- QA Contact:
Severity: C - Contributions    
Priority: --- CC: yann.sagon
Version: 23.11.10   
Hardware: Linux   
OS: Linux   
Site: -Other- Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: patch to scrub non-alnums

Description Benjamin Smith 2025-01-03 10:00:22 MST
Created attachment 40273 [details]
patch to scrub non-alnums

Hi,

I've been looking at the AutoDetect=nvml functionality.  In our case we have a graphics card with parentheses in the name, so this results in the automatic gres name "nvidia_titan_x_(pascal)" with the parentheses included.  We also have Titan X and Titan Xp cards so I don't want to rely on a substring match of just titan_x.

Example graphics card names from nvidia-smi -L output:
"NVIDIA TITAN X (Pascal)", "NVIDIA GeForce GTX TITAN X", "NVIDIA TITAN Xp"


I've applied a local patch (attached) that enhances gpu_common_underscorify_tolower to remove all non-space and non-alnum characters.

I don't think this is a massive priority to change in Slurm, but it's possible this bug report will be helpful to another admin.

Thanks,
Ben.
Comment 1 Yann 2026-02-06 07:55:08 MST
Hello, I would like to join this request as we have the same issue. This is the name we currently obtain using nvml:

- nvidia_titan_x
- tesla_p100-pcie-12gb
- nvidia_titan_rtx
- nvidia_geforce_rtx_2080_ti
- tesla_v100-pcie-32gb
- nvidia_geforce_rtx_3080
- nvidia_geforce_rtx_3090
- nvidia_rtx_a5000
- nvidia_rtx_a5500
- nvidia_rtx_a6000
- nvidia_a100-pcie-40gb
- nvidia_a100_80gb_pcie
- nvidia_geforce_rtx_4090
- nvidia_rtx_5000
- nvidia_h100_nvl
- nvidia_h200_nvl
- nvidia_rtx_pro_6000_blackwell
- nvidia_geforce_rtx_5090

We would expect something coherent such as vendor_architecture_model_interface_vram