Ticket 17490 - sacctmgr with -s option not showing all associations
Summary: sacctmgr with -s option not showing all associations
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 23.02.3
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Jason Booth
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-08-22 09:17 MDT by Brian Andrus
Modified: 2023-09-04 10:40 MDT (History)
0 users

See Also:
Site: Lam
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Brian Andrus 2023-08-22 09:17:25 MDT
For some of our users, running 'sacctmgr -s show user xxxx' does not show all associations, yet 'sacctmgr show associations user=xxxx' does.

example:
[root@monitor shm]# sacctmgr show -s user patilsu
      User   Def Acct     Admin    Cluster    Account  Partition     Share   Priority MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins                  QOS   Def QOS
---------- ---------- --------- ---------- ---------- ---------- --------- ---------- ------- -------- -------- --------- ----------- ----------- -------------------- ---------
  patilsu     default      None        lam    default    default         1                                                                                      normal
[root@monitor shm]# sacctmgr show associations user=patilsu
   Cluster    Account       User  Partition     Share   Priority GrpJobs       GrpTRES GrpSubmit     GrpWall   GrpTRESMins MaxJobs       MaxTRES MaxTRESPerNode MaxSubmit     MaxWall   MaxTRESMins                  QOS   Def QOS GrpTRESRunMin
---------- ---------- ---------- ---------- --------- ---------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- -------------
       lam     dpg-cm    patilsu                    1                                                                                                                                                             normal                
       lam    default   patilsu     default         1                                                                                                                                                             normal                

Yet for another user from the same group:
[root@monitor shm]# sacctmgr show -s user basarmi
      User   Def Acct     Admin    Cluster    Account  Partition     Share   Priority MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins                  QOS   Def QOS
---------- ---------- --------- ---------- ---------- ---------- --------- ---------- ------- -------- -------- --------- ----------- ----------- -------------------- ---------
   basarmi    default      None        lam    default    default         1                                                                                      normal
   basarmi    default      None        lam     dpg-cm                    1                                                                                      normal
[root@monitor shm]# sacctmgr show associations user=basarmi
   Cluster    Account       User  Partition     Share   Priority GrpJobs       GrpTRES GrpSubmit     GrpWall   GrpTRESMins MaxJobs       MaxTRES MaxTRESPerNode MaxSubmit     MaxWall   MaxTRESMins                  QOS   Def QOS GrpTRESRunMin
---------- ---------- ---------- ---------- --------- ---------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- -------------
       lam    default    basarmi    default         1                                                                                                                                                             normal                
       lam     dpg-cm    basarmi                    1                                                                                                                                                             normal
Comment 1 Brian Andrus 2023-08-22 09:52:00 MDT
To work with this, I deleted and re-created the user and account records.
My guess is that the database got corrupted somehow.
Comment 2 Jason Booth 2023-08-22 11:28:30 MDT
Would you please verify the LFT and RGT values by running the following command? These are ranges that accounts and users should fall within. If the output is not too great, you can also send that back for us to analyze. 

> $ sacctmgr show assoc format=cluster,account,user,qos,lft,rgt
Comment 3 Brian Andrus 2023-08-22 11:40:41 MDT
The output is a bit large. 27k rows.
I suspect the issue has been remedied, since I deleted and recreated the 8 accounts it was happening for and they now show up as expected.


[https://www.lamresearch.com/wp-content/uploads/2018/05/lam_research_logo_corporate.jpg]Brian Andrus - HPC Systems
brian.andrus@lamresearch.com<mailto:brian.andrus@lamresearch.com>

From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Tuesday, August 22, 2023 10:29 AM
To: Andrus, Brian <brian.andrus@lamresearch.com>
Subject: [Bug 17490] sacctmgr with -s option not showing all associations



External Email: Do NOT reply, click on links, or open attachments unless you recognize the sender and know the content is safe. If you believe this email may be unsafe, please click on the "Report Phishing" button on the top right of Outlook.


Comment # 2<https://bugs.schedmd.com/show_bug.cgi?id=17490#c2> on bug 17490<https://bugs.schedmd.com/show_bug.cgi?id=17490> from Jason Booth<mailto:jbooth@schedmd.com>

Would you please verify the LFT and RGT values by running the following

command? These are ranges that accounts and users should fall within. If the

output is not too great, you can also send that back for us to analyze.



> $ sacctmgr show assoc format=cluster,account,user,qos,lft,rgt

________________________________
You are receiving this mail because:

  *   You reported the bug.

LAM RESEARCH CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents, files, or previous e-mail messages attached to it, (collectively, "E-mail Transmission") may be subject to one or more of the following based on the associated sensitivity level: E-mail Transmission (i) contains confidential information, (ii) is prohibited from distribution outside of Lam, and/or (iii) is intended solely for and restricted to the specified recipient(s). If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify the sender and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you.


Confidential - Limited Access and Use
Comment 4 Jason Booth 2023-08-22 11:44:38 MDT
slurmdbd does contain a repair option for these values, though it is best to confirm any animalizes first. I will mention it below for completes.

[1] https://slurm.schedmd.com/slurmdbd.html#OPT_-R[comma-separated-cluster-name-list]

We are also actively look at better way to avoid these type of issues and have made some changes for 23.11 that should help.
Comment 5 Jason Booth 2023-09-04 10:40:11 MDT
Resolving, since your actions in comment#3 worked around this and comment#4 provides a solution to repair.