Ticket 9834

Summary: questions about 20.02 on Rhev
Product: Slurm Reporter: ruth.a.braun
Component: ConfigurationAssignee: Tim McMullan <mcmullan>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 20.02.3   
Hardware: Linux   
OS: Linux   
Site: EM Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: RHEL
Machine Name: CLE Version:
Version Fixed: 20.02.3 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description ruth.a.braun 2020-09-16 08:47:26 MDT
On a new cluster, I've installed 20.20.3  which is still in setup.
1)I would like to know if the slurmctld and slurmdbd can reside on a virtual machine (we use 1 host for both).  We are using RedHat Virtualization.  Or do you not recommend that.  Someone from SLUG20 replied in chat that it was not recommended but i'd like to hear from you.

also we use mariadb-5.5.64-1.el7.x86_64
2) on the new cluster we will be using RedHats identity management but will not be using the Activity Directory integration (maybe we will in the future).
Any thoughts/configuration action items if the /etc/passwd , etc files are not located for users on the compute nodes, etc?
Comment 1 Tim McMullan 2020-09-16 11:49:56 MDT
Hi!
(In reply to ruth.a.braun from comment #0)
> On a new cluster, I've installed 20.20.3  which is still in setup.
> 1)I would like to know if the slurmctld and slurmdbd can reside on a virtual
> machine (we use 1 host for both).  We are using RedHat Virtualization.  Or
> do you not recommend that.  Someone from SLUG20 replied in chat that it was
> not recommended but i'd like to hear from you.

There are some added considerations when running the slurmctld/slurmdbd in a VM, but there are some good reasons to do so.  The VM penalties can eventually cap your job throughput, though it should work just fine in many cases.  Its important to consider network latency and disk performance issues when running in a VM.

For network, I would suggest making sure the interfaces are one of the more performant types (VirtIO, PCI Passthrough) to keep the latency down.

The StateSaveLocation for the slurmctld should be backed by SSDs, possibly NVME storage for high throughput systems.  Similarly, the database should live on reasonable disks, and I would suggest passing them into the VM directly if possible.

> also we use mariadb-5.5.64-1.el7.x86_64

I would recommend upgrading to a more recent version of mariadb.  While the 5.5 series is shipped with RHEL7,  there have been some problems with the older releases.  There are MariaDB provided packages from their site to get the latest version into RHEL7.

> 2) on the new cluster we will be using RedHats identity management but will
> not be using the Activity Directory integration (maybe we will in the
> future).
> Any thoughts/configuration action items if the /etc/passwd , etc files are
> not located for users on the compute nodes, etc?

There is a module for slurm called "nss_slurm" that could help you with this.  The documentation on that module is here:

https://slurm.schedmd.com/nss_slurm.html

But in short this does, with some limitations, allow name and group lookup to occur in jobs even if the user doesn't exist in /etc/passwd on the compute node.  Its not designed to be a total replacement for the directory services, so those lookups don't work in things like prolog/epilog scripts, but it may provide what you need.

Let me know if this helps or if there is any other information/help I can provide!
Thanks!
--Tim
Comment 2 Tim McMullan 2020-09-23 11:27:19 MDT
Hi!

Just wanted to check in and make sure this answered your question!

Thanks!
--Tim
Comment 3 Tim McMullan 2020-09-25 10:11:19 MDT
Hi,

I'm going to resolve these for now, but please feel free to re-open if you need any more help related to this!

Thanks!
--Tim