Ticket 16996 - Reserving a node to use for a single user
Summary: Reserving a node to use for a single user
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 22.05.8
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Megan Dahl
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-06-16 15:22 MDT by Brent G
Modified: 2023-06-20 09:35 MDT (History)
1 user (show)

See Also:
Site: Recursion Pharma
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Brent G 2023-06-16 15:22:21 MDT
This is a general inquiry. We have a node that is down for maintenance; however, while it's down, we'd like to still somehow enable/reserve it for a specific user to schedule file transfer jobs. Is it possible to do this with SLURM? I couldn't find anything specific in docs and ChatGPT made up a ficticious answer about specifying an `AllowUsers=<user.name>` attribute with scontrol.
Comment 1 Megan Dahl 2023-06-19 12:39:16 MDT
Hi Brent,

While a node is DOWN no jobs can run on it, but you can have jobs pending to run on the node. In your case, in order to only allow a specific user to run jobs on the node you can set up a reservation using scontrol. If you would like the file transfer jobs to be the first to run on the node when it comes back up you can set the start time to begin while the node is still DOWN and have the end time be after the node is back up.

For example, the below command will create a reservation named file_transer that reserves node n1 for the_user. It will start immediately and will end in 10 minutes.

> scontrol create reservation reservationname=file_transfer nodes=n1 users=the_user starttime=now duration=10

Then the user that needs to submit the job can request the reservation with:
> sbatch --reservation=file_transfer --wrap=”srun hostname”

The sbatch will be pending until the node comes back up. This means that the duration of the reservation has to be large enough to last until the node is back up and the submitted jobs have completed. If there are any jobs not completed by the time the end time is reached they will be canceled unless ResvOverRun is set in slurm.conf.

You can find more information about reservations here:
> https://slurm.schedmd.com/scontrol.html#SECTION_RESERVATIONS---SPECIFICATIONS-FOR-CREATE,-UPDATE,-AND-DELETE-COMMANDS
> https://slurm.schedmd.com/reservations.html

~ Megan Dahl
Comment 2 Brent G 2023-06-20 09:35:44 MDT
Sorry for the delay. Thanks so much for your response. This helps a great deal!