21741 – Sanitise/canonicalise autodetected gpu names more throughly

Ticket 21741 - Sanitise/canonicalise autodetected gpu names more throughly

Summary: Sanitise/canonicalise autodetected gpu names more throughly

Status:	OPEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	GPU (show other tickets)
Version:	23.11.10
Hardware:	Linux Linux

Severity:	C - Contributions
Assignee:	Tim Wickberg
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2025-01-03 10:00 MST by Benjamin Smith
Modified:	2026-02-06 07:55 MST (History)
CC List:	1 user (show)

See Also:
Site:	-Other-
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
patch to scrub non-alnums (1.15 KB, patch) 2025-01-03 10:00 MST, Benjamin Smith	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Benjamin Smith 2025-01-03 10:00:22 MST

Created attachment 40273 [details]
patch to scrub non-alnums

Hi,

I've been looking at the AutoDetect=nvml functionality.  In our case we have a graphics card with parentheses in the name, so this results in the automatic gres name "nvidia_titan_x_(pascal)" with the parentheses included.  We also have Titan X and Titan Xp cards so I don't want to rely on a substring match of just titan_x.

Example graphics card names from nvidia-smi -L output:
"NVIDIA TITAN X (Pascal)", "NVIDIA GeForce GTX TITAN X", "NVIDIA TITAN Xp"


I've applied a local patch (attached) that enhances gpu_common_underscorify_tolower to remove all non-space and non-alnum characters.

I don't think this is a massive priority to change in Slurm, but it's possible this bug report will be helpful to another admin.

Thanks,
Ben.

Comment 1 Yann 2026-02-06 07:55:08 MST

Hello, I would like to join this request as we have the same issue. This is the name we currently obtain using nvml:

- nvidia_titan_x
- tesla_p100-pcie-12gb
- nvidia_titan_rtx
- nvidia_geforce_rtx_2080_ti
- tesla_v100-pcie-32gb
- nvidia_geforce_rtx_3080
- nvidia_geforce_rtx_3090
- nvidia_rtx_a5000
- nvidia_rtx_a5500
- nvidia_rtx_a6000
- nvidia_a100-pcie-40gb
- nvidia_a100_80gb_pcie
- nvidia_geforce_rtx_4090
- nvidia_rtx_5000
- nvidia_h100_nvl
- nvidia_h200_nvl
- nvidia_rtx_pro_6000_blackwell
- nvidia_geforce_rtx_5090

We would expect something coherent such as vendor_architecture_model_interface_vram