Ticket 13632 - Can scheduler and database be run on a VM?
Summary: Can scheduler and database be run on a VM?
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 22.05.x
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Nate Rini
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-03-16 09:15 MDT by Elijah Gagne
Modified: 2022-03-16 14:06 MDT (History)
2 users (show)

See Also:
Site: Dartmouth
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Elijah Gagne 2022-03-16 09:15:47 MDT
Hi SchedMD, 

We need to replace the hardware our scheduler and DB run on. Is there any inherent reason it could not be installed on VMs? We can scale VMs with lots of CPU, memory, and fast disks these days and a VM is just a lot nicer to work with. 

Thanks,
EWG
Comment 1 Nate Rini 2022-03-16 09:22:05 MDT
(In reply to Elijah Gagne from comment #0)
> We need to replace the hardware our scheduler and DB run on. Is there any
> inherent reason it could not be installed on VMs? We can scale VMs with lots
> of CPU, memory, and fast disks these days and a VM is just a lot nicer to
> work with. 

slurmd, slurmctld, slurmdbd, and MySQL/MariaDB can run on VMs. It is up to the site to ensure that the VMs have sufficient CPUs/Memory/Network for their cluster's load and/or jobs. The most common issue we see with VMs is that the clock source is not high enough precision to the point that we now have `sdiag` warn when this is detected. Depending on the VM vendor, most now provide client kernel drivers that provide high precision clock source but usually have to be enabled manually. Please also note that slurmctld, slurmdbd, and MySQL/MariaDB can be run inside of containers if a site doesn't want to pay the performance penalty for virtualization.

Do you have any more questions?
Comment 4 Nate Rini 2022-03-16 10:22:03 MDT
I wanted to emphasize this part:
> It is up to the site to ensure that the VMs have sufficient CPUs/Memory/Network for their cluster's load and/or jobs.

Many sites have ended up pulling Slurm off their VM systems due to them being too slow or other performance issues. While these issues are not specific to Slurm, they do have an unfortunate tendency of making Slurm look slow. Most VMs have a penalty of 10% for performance so the hardware needs to be that much faster to meet our suggestions here (slides 18-20):
> https://slurm.schedmd.com/SLUG21/Field_Notes_5.pdf

If a VM is used, it is required that the CPUs and Memory be pinned for Slurm's VMs.
Comment 6 Elijah Gagne 2022-03-16 14:06:10 MDT
Thanks. I think we're good to close this out.
-EWG