| Summary: | Can slurm accounts be used without slurm users | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Michael Schoenfelder <michael.schoenfelder> |
| Component: | Accounting | Assignee: | Ben Roberts <ben> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | marshall.adrian, vito.burggraf |
| Version: | 20.02.4 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | SiFive | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Michael Schoenfelder
2021-03-01 17:51:03 MST
Hi Michael, If you're trying to get accurate reporting on usage in different accounts then you would need to create associations for each user who is going to be submitting to a given account. I would also recommend using AccountingStorageEnforce to make sure users only submit to accounts they have access to so that usage doesn't get lost. As you've experienced, you can have users and accounts created without limiting the users to the accounts then have access to. However, if the user has an association in an account and they request an account they don't have an association in then the job will be recorded in the database as going to their default account. The log entry will look like this when that happens: [2021-03-02T11:35:52.156] _job_create: account 'sub4' has no association for user 1002 using default account 'sub1' If the user doesn't have an association in any account then it goes to the root account and doesn't get recorded in the database, which is the behavior you observed. If you want to report on the account usage accurately then you do need to have user associations created for the accounts they are going to be using, though you don't necessarily have to use AccountingStorageEnforce to prevent them from requesting something else. If you primarily have scripts submitting workflows in certain ways then you should be pretty safe to just configure the associations to accommodate the scripted workflows. If you have a lot of users submitting jobs manually then the chances of someone requesting an account they don't have access to goes up and you might want to consider limiting users to accounts they have access to. Let me know if you have any questions about this. Thanks, Ben Re "be pretty safe to just configure the associations to accommodate the scripted workflows. " But the scripted workflows run as the individual user. The idea is that the scripted workflow calls srun with "-A some_account" rather than depend on an association to match the unix user to a slurm user to an account. We essentially will have two accounts per team, so the flow will decide which account is most appropriate. Associations won't help us there. I understand that not having AccountingStorageEnforce nor associations allows for some users to "sneak" in. That will just be another reporting category so we can understand what users/flows we are missing the account specification. We could set up user associations, but are trying to avoid having yet another account tracking system and dealing with hiring and people leaving and switching orgs. Yes, I have seen the user contributions for syncing unix groups and slurm associations. We can't use it as-is, but it would be a starting point. My concern about going associationless is the loss of data issue. I SWAGged that not having associations was causing slurm* to miss the association cache each time, and that was keeping the slurm* busy refreshing from the DB which caused loss of data. If you guys think that using accounts w/o associations is technically bad, we'll bite the bullet. We understand that it is philosophically unusual. Hi Michael,
To have the account information accurately/reliably recorded you would need to have user associations created for each user in both accounts. I'm trying to think of alternatives and the best alternative I can think of would be to use WCKeys, which you stated you're already using for something else.
Another possibility that might work for you is to use the JobCompType plugin to record information about the job in another way when it finishes. You can configure it to record the data to a flat file or a database (among other options). This shifts things around so that even though the jobs may not be reported by sacct there will be a record stored with the account information in another location. This would require you to use your own tools to process the data, but it may be a better option for you than managing users/accounts.
Here's a quick example where I submitted a job as a user who didn't have an association currently on the system.
$ sbatch -N1 -Asub4 --wrap='srun sleep 30'
Submitted batch job 25787
When the job completed there isn't a record of it in the normal job database.
$ sacct -j 25787
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
But I configured my system to write completed job data to a flat file, so I can see the information about it there.
$ grep 25787 jobcomp_data.txt
JobId=25787 UserId=user5(1005) GroupId=user5(1005) Name=wrap JobState=COMPLETED Partition=debug TimeLimit=3600 StartTime=2021-03-04T10:44:51 EndTime=2021-03-04T10:45:22 NodeList=node01 NodeCnt=1 ProcCnt=1 WorkDir=/home/user5 ReservationName= Tres=cpu=1,node=1,billing=1 Account=sub4 QOS=normal WcKey= Cluster=unknown SubmitTime=2021-03-04T10:44:51 EligibleTime=2021-03-04T10:44:51 DerivedExitCode=0:0 ExitCode=0:0
These are the options I used to configure this.
$ scontrol show config | grep JobComp
JobCompHost = localhost
JobCompLoc = /home/ben/slurm/src/jobcomp_data.txt
JobCompPort = 0
JobCompType = jobcomp/filetxt
JobCompUser = root
You can read more about this here:
https://slurm.schedmd.com/slurm.conf.html#OPT_JobCompType
Let me know if this sounds like a workable option.
Thanks,
Ben
Ben, The JobCompType plugin is an interesting idea. We've been considering an ELK stack for monitoring various signals in our infrastructure, so that would fit right in. I realize that I left out an important point. One reason we want to track account usage is so that we can assign fairshare weights whenever we implement our fairshare scheme. We want to balance the fairshare based on accounts. However, I can see that we may confuse the fairshare algorithms if we don't have slurm users. Plus, missing job records would not be good for fairshare. When I tried your "$ sbatch -N1 -Asub4 --wrap='srun sleep 30'" example, I did get a job record. It does seem like this (ab)use of accounts is unreliable. Thank you for your feedback. You may close the ticket. You're right, if you're using Fairshare then the unreliability of the jobs being recorded properly will have an effect. I'll go ahead and close it, but let us know if there's anything else we can do to help. Thanks, Ben |