Ticket 19315

Summary: AllowAccounts not being obeyed
Product: Slurm Reporter: mmarkoc
Component: AccountingAssignee: Jacob Jenson <jacob>
Status: OPEN --- QA Contact:
Severity: 6 - No support contract    
Priority: --- CC: mmarkoc
Version: 23.02.7   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description mmarkoc 2024-03-14 10:40:06 MDT
Hi,

We created few new partitions back in version 22 to which we limit access using the AllowAccounts setting in the slurm.conf. We do not create user accounts for all of our users but only create accounts for the users requesting access to these partitions. That configuration worked fine for us in version 22. If a user without an account would try to submit a job to one of these partitions the job would go into pending state with following notification: 

          JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 1      test     test  test PD       0:00      1 (Job's account not known, so it can't use this partition (test allows lap_rc))

If I add EnforcePartLimits=ALL the job, as expected, wouldn't even get submitted but instead error out with the following note:

sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified

In the meantime, we upgraded to version 23.02.7. Just few days ago, we noticed that couple of random users (users not associated with allowed accounts) were able to submit jobs to these restricted partitions.

I've rebuilt our environment on a VM and I confirmed that the AllowAccount restriction is indeed working in 22.05.9 but stops working when I upgrade to 23.02.7. 

I've reviewed the release notes but I'm unable to see any change that could result in this not working anymore. I'm aware that AccountingStorageEnforce=associations could make this work but we do not want to create slurm user accounts for all of the users.

Thanks,
Marko
Comment 1 mmarkoc 2024-03-22 15:39:22 MDT
I haven't had a chance to fully test it yet but I wonder if the commit 776edd83952082e3ec46b3726858755c96af8a60 might have changed this behavior? 
Seems to be related to how account membership is verified against the AllowAccount list. 

Thanks!