| Summary: | sacctmgr with -s option not showing all associations | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Brian Andrus <brian.andrus> |
| Component: | Accounting | Assignee: | Jason Booth <jbooth> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 23.02.3 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Lam | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
To work with this, I deleted and re-created the user and account records. My guess is that the database got corrupted somehow. Would you please verify the LFT and RGT values by running the following command? These are ranges that accounts and users should fall within. If the output is not too great, you can also send that back for us to analyze.
> $ sacctmgr show assoc format=cluster,account,user,qos,lft,rgt
The output is a bit large. 27k rows. I suspect the issue has been remedied, since I deleted and recreated the 8 accounts it was happening for and they now show up as expected. [https://www.lamresearch.com/wp-content/uploads/2018/05/lam_research_logo_corporate.jpg]Brian Andrus - HPC Systems brian.andrus@lamresearch.com<mailto:brian.andrus@lamresearch.com> From: bugs@schedmd.com <bugs@schedmd.com> Sent: Tuesday, August 22, 2023 10:29 AM To: Andrus, Brian <brian.andrus@lamresearch.com> Subject: [Bug 17490] sacctmgr with -s option not showing all associations External Email: Do NOT reply, click on links, or open attachments unless you recognize the sender and know the content is safe. If you believe this email may be unsafe, please click on the "Report Phishing" button on the top right of Outlook. Comment # 2<https://bugs.schedmd.com/show_bug.cgi?id=17490#c2> on bug 17490<https://bugs.schedmd.com/show_bug.cgi?id=17490> from Jason Booth<mailto:jbooth@schedmd.com> Would you please verify the LFT and RGT values by running the following command? These are ranges that accounts and users should fall within. If the output is not too great, you can also send that back for us to analyze. > $ sacctmgr show assoc format=cluster,account,user,qos,lft,rgt ________________________________ You are receiving this mail because: * You reported the bug. LAM RESEARCH CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents, files, or previous e-mail messages attached to it, (collectively, "E-mail Transmission") may be subject to one or more of the following based on the associated sensitivity level: E-mail Transmission (i) contains confidential information, (ii) is prohibited from distribution outside of Lam, and/or (iii) is intended solely for and restricted to the specified recipient(s). If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify the sender and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you. Confidential - Limited Access and Use slurmdbd does contain a repair option for these values, though it is best to confirm any animalizes first. I will mention it below for completes. [1] https://slurm.schedmd.com/slurmdbd.html#OPT_-R[comma-separated-cluster-name-list] We are also actively look at better way to avoid these type of issues and have made some changes for 23.11 that should help. |
For some of our users, running 'sacctmgr -s show user xxxx' does not show all associations, yet 'sacctmgr show associations user=xxxx' does. example: [root@monitor shm]# sacctmgr show -s user patilsu User Def Acct Admin Cluster Account Partition Share Priority MaxJobs MaxNodes MaxCPUs MaxSubmit MaxWall MaxCPUMins QOS Def QOS ---------- ---------- --------- ---------- ---------- ---------- --------- ---------- ------- -------- -------- --------- ----------- ----------- -------------------- --------- patilsu default None lam default default 1 normal [root@monitor shm]# sacctmgr show associations user=patilsu Cluster Account User Partition Share Priority GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin ---------- ---------- ---------- ---------- --------- ---------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- ------------- lam dpg-cm patilsu 1 normal lam default patilsu default 1 normal Yet for another user from the same group: [root@monitor shm]# sacctmgr show -s user basarmi User Def Acct Admin Cluster Account Partition Share Priority MaxJobs MaxNodes MaxCPUs MaxSubmit MaxWall MaxCPUMins QOS Def QOS ---------- ---------- --------- ---------- ---------- ---------- --------- ---------- ------- -------- -------- --------- ----------- ----------- -------------------- --------- basarmi default None lam default default 1 normal basarmi default None lam dpg-cm 1 normal [root@monitor shm]# sacctmgr show associations user=basarmi Cluster Account User Partition Share Priority GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin ---------- ---------- ---------- ---------- --------- ---------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- ------------- lam default basarmi default 1 normal lam dpg-cm basarmi 1 normal