Ticket 16996

Summary: Reserving a node to use for a single user
Product: Slurm Reporter: Brent G <brent.gawryluik>
Component: ConfigurationAssignee: Megan Dahl <megan>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: megan
Version: 22.05.8   
Hardware: Linux   
OS: Linux   
Site: Recursion Pharma Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Brent G 2023-06-16 15:22:21 MDT
This is a general inquiry. We have a node that is down for maintenance; however, while it's down, we'd like to still somehow enable/reserve it for a specific user to schedule file transfer jobs. Is it possible to do this with SLURM? I couldn't find anything specific in docs and ChatGPT made up a ficticious answer about specifying an `AllowUsers=<user.name>` attribute with scontrol.
Comment 1 Megan Dahl 2023-06-19 12:39:16 MDT
Hi Brent,

While a node is DOWN no jobs can run on it, but you can have jobs pending to run on the node. In your case, in order to only allow a specific user to run jobs on the node you can set up a reservation using scontrol. If you would like the file transfer jobs to be the first to run on the node when it comes back up you can set the start time to begin while the node is still DOWN and have the end time be after the node is back up.

For example, the below command will create a reservation named file_transer that reserves node n1 for the_user. It will start immediately and will end in 10 minutes.

> scontrol create reservation reservationname=file_transfer nodes=n1 users=the_user starttime=now duration=10

Then the user that needs to submit the job can request the reservation with:
> sbatch --reservation=file_transfer --wrap=”srun hostname”

The sbatch will be pending until the node comes back up. This means that the duration of the reservation has to be large enough to last until the node is back up and the submitted jobs have completed. If there are any jobs not completed by the time the end time is reached they will be canceled unless ResvOverRun is set in slurm.conf.

You can find more information about reservations here:
> https://slurm.schedmd.com/scontrol.html#SECTION_RESERVATIONS---SPECIFICATIONS-FOR-CREATE,-UPDATE,-AND-DELETE-COMMANDS
> https://slurm.schedmd.com/reservations.html

~ Megan Dahl
Comment 2 Brent G 2023-06-20 09:35:44 MDT
Sorry for the delay. Thanks so much for your response. This helps a great deal!