| Summary: | Sanitise/canonicalise autodetected gpu names more throughly | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Benjamin Smith <bsmith5> |
| Component: | GPU | Assignee: | Tim Wickberg <tim> |
| Status: | OPEN --- | QA Contact: | |
| Severity: | C - Contributions | ||
| Priority: | --- | CC: | yann.sagon |
| Version: | 23.11.10 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | -Other- | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: | patch to scrub non-alnums | ||
Hello, I would like to join this request as we have the same issue. This is the name we currently obtain using nvml: - nvidia_titan_x - tesla_p100-pcie-12gb - nvidia_titan_rtx - nvidia_geforce_rtx_2080_ti - tesla_v100-pcie-32gb - nvidia_geforce_rtx_3080 - nvidia_geforce_rtx_3090 - nvidia_rtx_a5000 - nvidia_rtx_a5500 - nvidia_rtx_a6000 - nvidia_a100-pcie-40gb - nvidia_a100_80gb_pcie - nvidia_geforce_rtx_4090 - nvidia_rtx_5000 - nvidia_h100_nvl - nvidia_h200_nvl - nvidia_rtx_pro_6000_blackwell - nvidia_geforce_rtx_5090 We would expect something coherent such as vendor_architecture_model_interface_vram |
Created attachment 40273 [details] patch to scrub non-alnums Hi, I've been looking at the AutoDetect=nvml functionality. In our case we have a graphics card with parentheses in the name, so this results in the automatic gres name "nvidia_titan_x_(pascal)" with the parentheses included. We also have Titan X and Titan Xp cards so I don't want to rely on a substring match of just titan_x. Example graphics card names from nvidia-smi -L output: "NVIDIA TITAN X (Pascal)", "NVIDIA GeForce GTX TITAN X", "NVIDIA TITAN Xp" I've applied a local patch (attached) that enhances gpu_common_underscorify_tolower to remove all non-space and non-alnum characters. I don't think this is a massive priority to change in Slurm, but it's possible this bug report will be helpful to another admin. Thanks, Ben.