Ticket 2537 - SLURM: sacctmgr bug
Summary: SLURM: sacctmgr bug
Status: RESOLVED WONTFIX
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 15.08.4
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2016-03-10 03:02 MST by taylor@rc.ufl.edu
Modified: 2017-11-03 10:44 MDT (History)
0 users

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description taylor@rc.ufl.edu 2016-03-10 03:02:38 MST
This looks like a bug to me but it is only happening for a handful of users and it is repeatable.

# sacctmgr -i create User Name='ly07336' Cluster='hipergator' Account='mckenna' DefaultAccount='mckenna' QOS='mckenna,mckenna-b' DefaultQOS='mckenna' Fairshare=parent
 Adding User(s)
  ly07336
 Settings =
  Default Account = mckenna
 Associations =
  U = ly07336   A = mckenna    C = hipergator
 Non Default Settings
  Fairshare     = 2147483647
  QOS           = mckenna,mckenna-b
--------------------------------------------------------------------------------

Looks good to me but then when I go to display the association, I get the following which is wrong.


[root@slurm1 ufrc]# sacctmgr show user ly07336 withassoc      User   Def Acct     Admin    Cluster    Account  Partition     Share MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins                  QOS   Def QOS 
---------- ---------- --------- ---------- ---------- ---------- --------- ------- -------- -------- --------- ----------- ----------- -------------------- --------- 
  ly07336                  None                                          0       0        0        0         0    00:00:00           0                                


Now, if I do the same for a random user name, say "xxxxxx".

[root@slurm1 ufrc]# sacctmgr -i create User Name='xxxxxx' Cluster='hipergator' Account='mckenna' DefaultAccount='mckenna' QOS='mckenna,mckenna-b' DefaultQOS='mckenna' Fairshare=parent
 Adding User(s)
  xxxxxx
 Settings =
  Default Account = mckenna
 Associations =
  U = xxxxxx    A = mckenna    C = hipergator
 Non Default Settings
  Fairshare     = 2147483647
  QOS           = mckenna,mckenna-b


And then show the association, it is fine...

[root@slurm1 ufrc]# sacctmgr show user xxxxxx  withassoc      User   Def Acct     Admin    Cluster    Account  Partition     Share MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins                  QOS   Def QOS 
---------- ---------- --------- ---------- ---------- ---------- --------- ------- -------- -------- --------- ----------- ----------- -------------------- --------- 
    xxxxxx    mckenna      None hipergator    mckenna               parent                                                                mckenna,mckenna-b   mckenna 


I feel like the database may be corrupted or otherwise have values that sacctmgr is not showing.  Any ideas.  Anyone else reporting this?
Comment 1 taylor@rc.ufl.edu 2016-03-10 03:22:42 MST
Looking at the slurm mysql database, I found that the user names that were causing problems were in the database (deleted=0) even though they had been deleted with sacctmgr.  This was not the case with other users or the random "xxxxxx" user in the example.  

Not sure why that would be.
Comment 2 taylor@rc.ufl.edu 2016-03-10 03:53:36 MST
The three user associations that had this problem were,

ly07336
wperry
rmckenna

The problem seems to have been fixed by going into mysql and deleting the associated entries from the relevant tables.

mysql> use slurm;

mysql> delete from user_table where name = 'rmckenna';
Query OK, 1 row affected (0.00 sec)

mysql> delete from hipergator_assoc_table where user = 'rmckenna';
Query OK, 2 rows affected (0.00 sec)
Comment 4 Jacob Jenson 2017-11-03 10:44:12 MDT
This version of Slurm is no longer supported.