Ticket 21741 - Sanitise/canonicalise autodetected gpu names more throughly
Summary: Sanitise/canonicalise autodetected gpu names more throughly
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: GPU (show other tickets)
Version: 23.11.10
Hardware: Linux Linux
: C - Contributions
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2025-01-03 10:00 MST by Benjamin Smith
Modified: 2026-02-06 07:55 MST (History)
1 user (show)

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
patch to scrub non-alnums (1.15 KB, patch)
2025-01-03 10:00 MST, Benjamin Smith
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Benjamin Smith 2025-01-03 10:00:22 MST
Created attachment 40273 [details]
patch to scrub non-alnums

Hi,

I've been looking at the AutoDetect=nvml functionality.  In our case we have a graphics card with parentheses in the name, so this results in the automatic gres name "nvidia_titan_x_(pascal)" with the parentheses included.  We also have Titan X and Titan Xp cards so I don't want to rely on a substring match of just titan_x.

Example graphics card names from nvidia-smi -L output:
"NVIDIA TITAN X (Pascal)", "NVIDIA GeForce GTX TITAN X", "NVIDIA TITAN Xp"


I've applied a local patch (attached) that enhances gpu_common_underscorify_tolower to remove all non-space and non-alnum characters.

I don't think this is a massive priority to change in Slurm, but it's possible this bug report will be helpful to another admin.

Thanks,
Ben.
Comment 1 Yann 2026-02-06 07:55:08 MST
Hello, I would like to join this request as we have the same issue. This is the name we currently obtain using nvml:

- nvidia_titan_x
- tesla_p100-pcie-12gb
- nvidia_titan_rtx
- nvidia_geforce_rtx_2080_ti
- tesla_v100-pcie-32gb
- nvidia_geforce_rtx_3080
- nvidia_geforce_rtx_3090
- nvidia_rtx_a5000
- nvidia_rtx_a5500
- nvidia_rtx_a6000
- nvidia_a100-pcie-40gb
- nvidia_a100_80gb_pcie
- nvidia_geforce_rtx_4090
- nvidia_rtx_5000
- nvidia_h100_nvl
- nvidia_h200_nvl
- nvidia_rtx_pro_6000_blackwell
- nvidia_geforce_rtx_5090

We would expect something coherent such as vendor_architecture_model_interface_vram