Ticket 6652

Summary: Cannot create persistent connection with different slurm user
Product: Slurm Reporter: Matt Ezell <ezellma>
Component: FederationAssignee: Unassigned Developer <dev-unassigned>
Status: OPEN --- QA Contact:
Severity: 5 - Enhancement    
Priority: --- CC: brian
Version: 18.08.5   
Hardware: Linux   
OS: Linux   
Site: NOAA Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: ORNL OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Matt Ezell 2019-03-07 07:11:26 MST
We are testing federation and came across a problem.  Our 'es' cluster has SlurmUser=slurm, but our Crays have to run with SlurmUser=root because of the interaction with aeld.  validate_slurm_user() fails, so we see messages like:
[2019-03-07T14:04:31.001] error: Security violation, REQUEST_PERSIST_INIT RPC from uid=2035

Thoughts on how to move forward?  Will we need to set SlurmUser=root across the board?
Comment 4 Broderick Gardner 2019-03-07 13:36:48 MST
I've looked into this, and it is a problem because currently, controllers validate users based on whether they are root or SlurmUser from their point of view. So Cray clusters with SlurmUser=root will not allow PERSIST_INIT from a cluster with SlurmUser=slurm. 

Fixing this with the current versions of Slurm would require setting all federated clusters to SlurmUser=root, yes. 

As an enhancement, we have discussed fixing this by validating users via the slurmdbd, either using the existing Slurm Administrator functionality or including each cluster's SlurmUser in the database cluster record. I'm investigating those options; I'll update you once I know more. 

Thanks
Comment 5 Broderick Gardner 2019-03-15 11:40:04 MDT
I'm marking this as an enhancement.