Ticket 19066 - slurm loaded the nvidia_uvm kernel module, which made it impossible to upgrade nvidia gpu drivers online
Summary: slurm loaded the nvidia_uvm kernel module, which made it impossible to upgrad...
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Limits (show other tickets)
Version: 23.11.3
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2024-02-21 23:33 MST by andi cao
Modified: 2024-06-24 08:33 MDT (History)
1 user (show)

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description andi cao 2024-02-21 23:33:14 MST
After I upgraded slurm to slurm 23.11.3 version, I tried to online upgrade the GPU driver of the computing node, and it prompted me that the nvidia_uvm module was being used. After analysis, I found that the slurmd service imported the kernel module. Is it possible to make some improvements, not to let slurm load the nvidia uvm module for a long time, but to load it when necessary.

Thank you.