Ticket 9903 - Failed to add association by coordinator
Summary: Failed to add association by coordinator
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Database (show other tickets)
Version: 20.02.4
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Albert Gil
QA Contact:
URL:
: 8795 (view as ticket list)
Depends on:
Blocks: 8452
  Show dependency treegraph
 
Reported: 2020-09-25 18:10 MDT by Jonathon Anderson
Modified: 2021-08-19 14:41 MDT (History)
2 users (show)

See Also:
Site: University of Colorado
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 21.08.0
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurmdbd.conf (344 bytes, application/octet-stream)
2020-10-07 13:32 MDT, Jonathon Anderson
Details
slurm.conf (10.46 KB, application/octet-stream)
2020-10-07 13:32 MDT, Jonathon Anderson
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Jonathon Anderson 2020-09-25 18:10:21 MDT
We are experiencing the same issue as previously reported in #5705.

An account coordinator can create a sub-account of their account, and they are listed as coordinator for the sub-account, but they cannot manage its membership. If an admin then redundantly set the coordinator for the sub-account, it behaves as expected.

We expect a coordinator to be able to manage the membership for sub-accounts they create, just like the accounts they are originally made coordinator of.
Comment 1 Jonathon Anderson 2020-09-30 17:01:13 MDT
Any idea on if/when this might be fixed?
Comment 4 Albert Gil 2020-10-01 04:37:55 MDT
Hi Jonathon,

Trying to reproduce the issue reported also on bug 5705 I've detected other malfunctions on trying to do some actions as Coordinator.

I'm investigating this, and I'll keep you posted.

Albert
Comment 7 Albert Gil 2020-10-06 09:28:11 MDT
Hi Jonathon,

I've been able to reproduce a very similar error, but I¡m not sure if this is exactly the same error on bug 5705, or the one that you are facing now. The cause is probably the same, but I would like to double-check that you and me are seeing that same error.

On bug 5705 when the Coordinator creates the subaccount and he does the "sacctmgr show account withcoord", the subaccount is listed and he appears as Coordinator.
That was on 17.02.

I don't exactly see this on my tests on 20.02.

Create the account as Admin:
admin$ sacctmgr create account acct9903
admin$ sacctmgr create user bob account=acct9903
admin$ sacctmgr create coordinator account=acct9903 names=bob

admin$ sacctmgr show accounts withcoord
   Account                Descr                  Org       Coord Accounts 
---------- -------------------- -------------------- -------------------- 
  acct9903             acct9903             acct9903                  bob 
      root default root account                 root                      


admin$ sacctmgr show association tree format=account,user
             Account       User 
-------------------- ---------- 
root                            
 root                      root 
 acct9903                       
  acct9903                  bob 

admin$ sacctmgr show user bob withcoord
      User   Def Acct     Admin       Coord Accounts 
---------- ---------- --------- -------------------- 
       bob   acct9903      None             acct9903 


As Coordinator, the subaccount can be created correctly:

coord$ sacctmgr create account sub9903 parent=acct9903

But it is not even listed when listing as Coordinator (unlike bug 5705):

coord$ sacctmgr show accounts withcoord
   Account                Descr                  Org       Coord Accounts 
---------- -------------------- -------------------- -------------------- 
  acct9903             acct9903             acct9903                  bob 


It is listed as Admin, though:

admin$ sacctmgr show accounts withcoord 
   Account                Descr                  Org       Coord Accounts 
---------- -------------------- -------------------- -------------------- 
  acct9903             acct9903             acct9903                  bob 
      root default root account                 root                      
   sub9903              sub9903             acct9903                  bob 


When trying to create a user into the subaccount as Coordinator I get this error:

coord$ sacctmgr add user sue account=sub9903
 This account 'sub9903' doesn't exist.
        Contact your admin to add this account.
 Nothing new added.


Is this the error that you are getting?
On bug 5705 the error was slightly different:

> [syrchin@master ~]$ sacctmgr add user name=user account=test.0
>  Associations =
>   U = user     A = test.0     C = caba2     
>  Non Default Settings
>  Problem adding users: Access/permission denied


As you mentioned, adding Bob as Coordinator of the subaccount is a workaround for the issue:

admin$ sacctmgr create coordinator account=sub9903 

coord$ sacctmgr add user sue account=sub9903
coord$ sacctmgr show association tree format=account,user
             Account       User 
-------------------- ---------- 
root                            
 acct9903                       
  acct9903                  bob 
  sub9903                       
   sub9903                  sue 

Am I reproducing your error, or are you facing a slightly different one?

I've also detected some interference also using TrackWCKey and I know that we were alerady working on some internal tasks to fix some behavior related to Coordinators.

I'll keep you posted on my findings,
Albert
Comment 11 Jonathon Anderson 2020-10-06 16:59:40 MDT
I think I'm seeing slightly different behavior than you are:

First, as an admin:

-
[joan5896@admin2 ~]$ sacctmgr create account testacct cluster=blanca
 Adding Account(s)
  testacct
 Settings
  Description     = Account Name
  Organization    = Parent/Account Name
 Associations
  A = testacct   C = blanca    
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y

[joan5896@admin2 ~]$ sacctmgr add coordinator account=testacct cluster=blanca user=rcco2010                                         
 Adding Coordinator User(s)
  rcco2010
 To Account(s) and all sub-accounts
  testacct
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
-

Then, as the coordinator.

-
[rcco2010@login10 ~]$ sacctmgr create acct subtestacct parent=testacct cluster=blanca
 Adding Account(s)
  subtestacct
 Settings
  Description     = Account Name
  Organization    = Parent/Account Name
 Associations
  A = subtestacc C = blanca    
 Settings
  Parent        = testacct
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y

[rcco2010@login10 ~]$ sacctmgr show accounts withcoord testacct
   Account                Descr                  Org       Coord Accounts 
---------- -------------------- -------------------- -------------------- 
  testacct             testacct             testacct             rcco2010 

[rcco2010@login10 ~]$ sacctmgr show accounts withcoord subtestacct
   Account                Descr                  Org       Coord Accounts 
---------- -------------------- -------------------- -------------------- 
subtestac+          subtestacct             testacct             rcco2010
-

So the coordinator sees themself as the coordinator of the subtestacct.

I'm doing this in a production system with a lot of accounts, so I'm not listing all of them; but I also see that it's shown in a blanket list.

-
[rcco2010@login10 ~]$ sacctmgr show accounts -p withcoord | grep subtestacct
subtestacct|subtestacct|testacct|rcco2010|
-
Comment 12 Albert Gil 2020-10-07 01:44:02 MDT
Thanks Jonathon,
Yes, this is slightly different.
Could you attach your slurm.conf and slurmdbd.conf?

Also, I'm interested in the exact error that you see when your Coordinator tries to add a user into his subaccount and he can't.
Could you also post the command line error, plus the log slurmdbd when running that command?

Thanks,
Albert
Comment 16 Jonathon Anderson 2020-10-07 13:32:13 MDT
Created attachment 16153 [details]
slurmdbd.conf

slurmdbd.conf and slurm.conf attached.

Here's the command-line:

-
[rcco2010@login10 ~]$ sacctmgr create acct subtestacct parent=testacct cluster=blanca
 Adding Account(s)
  subtestacct
 Settings
  Description     = Account Name
  Organization    = Parent/Account Name
 Associations
  A = subtestacc C = blanca
 Settings
  Parent        = testacct
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y

[rcco2010@login10 ~]$ sacctmgr add user joan5896 account=subtestacct cluster=blanca
 Associations =
  U = joan5896  A = subtestacc C = blanca
 Non Default Settings
 Problem adding users: Access/permission denied
-

And here's the slurmdbd log:

-
Oct 07 13:31:26 slurmdb1.rc.int.colorado.edu slurmdbd[17810]: error: CONN:12 Your user doesn't have privilege to perform this action
Oct 07 13:31:26 slurmdb1.rc.int.colorado.edu slurmdbd[17810]: error: CONN:12 Security violation, DBD_ADD_ASSOCS
Oct 07 13:31:26 slurmdb1.rc.int.colorado.edu slurmdbd[17810]: error: Processing last message from connection 12(10.225.160.181) uid(1000288)
-

________________________________________
From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Wednesday, October 7, 2020 1:44 AM
To: Jonathon A Anderson
Subject: [Bug 9903] Failed to add association by coordinator

Comment # 12<https://bugs.schedmd.com/show_bug.cgi?id=9903#c12> on bug 9903<https://bugs.schedmd.com/show_bug.cgi?id=9903> from Albert Gil<mailto:albert.gil@schedmd.com>

Thanks Jonathon,
Yes, this is slightly different.
Could you attach your slurm.conf and slurmdbd.conf?

Also, I'm interested in the exact error that you see when your Coordinator
tries to add a user into his subaccount and he can't.
Could you also post the command line error, plus the log slurmdbd when running
that command?

Thanks,
Albert

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 17 Jonathon Anderson 2020-10-07 13:32:14 MDT
Created attachment 16154 [details]
slurm.conf
Comment 21 Jonathon Anderson 2020-10-22 20:26:01 MDT
Checking in: any further information on this?
Comment 22 Albert Gil 2020-10-23 04:41:34 MDT
Hi Jonathon,

Studying this problem we have detected several issues on the Coordinator feature, and some look to be there from long time ago.

At this point we've detected:
- The two errors on subaccounts that you and me detected on this bug
- A general problem when TrackWCKey=yes is set
- A problem when a user/coordinator is already in other Accounts related to their default account
- Wrong transactions in the DB related to Coordinators

With all this, we're working on a more complete analysis of the Coordinator feature.
It will take a little bit more than initially seemed.

I'll keep you posted,
Albert
Comment 23 Jonathon Anderson 2020-11-17 23:19:35 MST
Not trying to rush you; but how are things going with cleaning up the coordinator role implementation?
Comment 24 Albert Gil 2020-11-18 02:29:58 MST
Hi Jonathon,

We discarded to target the necessary changes to the new 20.11(.0), and we've been focused a bit more on releasing 20.11 the last weeks.
Now it's out, and we'll start working on this and keep you updated.

At this point I'm not sure if the final fix will land on 20.11.1, or most probably on the master branch, ie on the future 21.08.

Regards,
Albert
Comment 25 Jonathon Anderson 2020-11-18 14:11:47 MST
I want to make sure I understand: you're expecting these fixes might be available in August 2021?

~jonathon
Comment 26 Albert Gil 2020-11-19 00:28:15 MST
Hi Jonathon,

> I want to make sure I understand: you're expecting these fixes might be
> available in August 2021?

Not exactly.
The fix should be done (ie, committed on github) sooner, even this 2020.
But depending on how it finally is, it will be released (ie, branched/tagged on github) as part of some of the minor releases expected for 20.11 (ie 20.11.x), or only as part the next major release 21.08.

Regards,
Albert
Comment 27 Jonathon Anderson 2021-01-21 16:40:05 MST
Checking in to see if this has been included in any releases yet.
Comment 28 Albert Gil 2021-01-21 23:54:45 MST
Hi Jonathon,

Not yet, it's still work in progress.

Regards,
Albert
Comment 29 Jonathon Anderson 2021-03-04 17:05:14 MST
Checking in again to see what progress has been made. ^_^
Comment 30 Albert Gil 2021-03-05 07:15:10 MST
Sorry Jonathon,

Still work in progress without a significant update.
But it's not forgotten at all.
We want to fix/work on all the issues listed in comment 22.

Regards,
Albert
Comment 31 Jonathon Anderson 2021-04-02 13:50:51 MDT
Checking in again to see what the status of this effort is.
Comment 39 Albert Gil 2021-04-07 12:46:16 MDT
Hi Jonathon,

At this point I can confirm that one the issue related to Coordinators is actually a problem when a the subaccount is created.

That is, if the subaccount is created *before* the Coordinator of the parent account, then when the Coordinator of the parent is created it works proerly and can add ussers to the subaccount too.
Also, whenever the order is, when slurmdbd is restarted it works fine again.

Is this also happenning in your case?

The problem seems that when the subaccount is added, although the information in the actual DB is correct, the cached information of the coordinators on slurmdbd is not properly updated.

We are working on a fix.

Regards,
Albert
Comment 40 Jonathon Anderson 2021-04-07 16:38:19 MDT
> if the subaccount is created *before* the Coordinator of the parent account, then when the Coordinator of the parent is created it works proerly and can add ussers to the subaccount too.

I believe this matches our experience.

> when slurmdbd is restarted it works fine again

I don't think we've tested this. But your explanation makes sense to me.
Comment 52 Albert Gil 2021-05-20 10:08:30 MDT
*** Ticket 8795 has been marked as a duplicate of this ticket. ***
Comment 57 Albert Gil 2021-05-27 09:03:38 MDT
Hi Jonathon,

I just want to let you know that we already have a first version of a patch that fixes your issue and some others that we found while working on this, all related to Coordinators.
The patch is now going into our QA process.

I'll keep you updated,
Albert
Comment 58 Kilian Cavalotti 2021-06-21 19:36:34 MDT
Just adding our 2 cents here to say that we're interested by this fix as well.

Thanks!
--
Kilian
Comment 60 Jonathon Anderson 2021-08-04 15:31:15 MDT
What is the current status of this effort?
Comment 62 Albert Gil 2021-08-17 09:07:02 MDT
Hi Jonathon,

> What is the current status of this effort?

Sorry for the delay answering this, I've been some days out of office.
Although we are a bit focused on the upcoming release (21.08), the QA process is going on and well, close to the end.

I'll keep you posted,
Albert
Comment 68 Albert Gil 2021-08-19 10:09:50 MDT
Hi Jonathon,

I'm glad to inform you that this issue has been fixed and it will be released as part of the new version 21.08 that will be released soon.
We have also fixed another issue related to Coordinators when PrivateData=users is used.


I hope it helps,
Albert
Comment 69 Jonathon Anderson 2021-08-19 10:38:34 MDT
Great! Thanks for letting me know.