Ticket 6792 - "sacctmgr list user where qos=gpu format=user" lists all users regardless of QOS
Summary: "sacctmgr list user where qos=gpu format=user" lists all users regardless of QOS
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 18.08.6
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Albert Gil
QA Contact:
URL:
: 5420 6813 12151 (view as ticket list)
Depends on:
Blocks:
 
Reported: 2019-04-02 10:12 MDT by Chris Samuel (NERSC)
Modified: 2021-08-25 11:06 MDT (History)
3 users (show)

See Also:
Site: NERSC
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Chris Samuel (NERSC) 2019-04-02 10:12:50 MDT
Hi there,

We've got a QOS called GPU for a select band of users who have access to a small set of GPU nodes.  I was asked how to query Slurm to see who is in the QOS and said:

sacctmgr list user where qos=gpu format=user

but that caused consternation as that reported 7,000+ users and not the expected number of around 100.  This is because Slurm seems to list all users regardless of QOS, despite the where clause.

As a workaround we've found that if you ask for the user and qos list then the QOS list is blank for users who are not in the QOS and populated for those who are so we can use awk to pull that list of users out.

This is probably a duplicate of bug #5420 from 9 months ago but as they don't have a support contract it's not being worked on.

All the best,
Chris
Comment 2 Albert Gil 2019-04-03 06:15:04 MDT
Hi Chris,

I've replicated the issue but I'm not certain if it expected or not.
First of all please note that you can achieve what you want just changing "user" by "assoc":

$ sacctmgr show assoc format=user,account,qos
      User    Account                  QOS 
---------- ---------- -------------------- 
                 root               normal 
      root       root               normal 
           developme+               normal 
      agil developme+               normal 
       bob developme+       normal,qos6702 
       sue developme+               normal 
             external               normal 
       joe   external               normal 


$ sacctmgr show assoc format=user,account,qos where qos=qos6702
      User    Account                  QOS 
---------- ---------- -------------------- 
       bob developme+       normal,qos6702 

$ sacctmgr show assoc format=user,account,qos where account=external
      User    Account                  QOS 
---------- ---------- -------------------- 
             external               normal 
       joe   external               normal 

In the documentation you can see that SPECIFICATIONS FOR USERS and LIST/SHOW USER FORMAT OPTIONS doesn't include QOS, but LIST/SHOW ASSOCIATION FORMAT OPTIONS and GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES does:
https://slurm.schedmd.com/sacctmgr.html

But it looks strange to me.
Also, it's not only happening when trying to filter by QoS, but it looks like with Account (and any other filter) the same behavior is happening:
- all users are listed
- the specific field filtered is correctly shown only for the right users, and blank for the rest


$ sacctmgr list user withassoc format=user,account,qos
      User    Account                  QOS 
---------- ---------- -------------------- 
      agil developme+               normal 
       bob developme+       normal,qos6702 
       joe   external               normal 
      root       root               normal 
       sue developme+               normal 


$ sacctmgr list user withassoc format=user,account,qos where account=external
      User    Account                  QOS 
---------- ---------- -------------------- 
      agil                                 
       bob                                 
       joe   external               normal 
      root                                 
       sue                                 

$ sacctmgr list user withassoc format=user,account,qos where qos=qos6702
      User    Account                  QOS 
---------- ---------- -------------------- 
      agil                                 
       bob developme+       normal,qos6702 
       joe                                 
      root                                 
       sue                           


I will investigate it further to see what's going on for the "list user" case, but I hope that "list/show assoc" helps in your use case in the meantime. Does it?

Albert
Comment 7 Albert Gil 2019-04-09 11:54:21 MDT
Hi Chris,

> I've replicated the issue but I'm not certain if it expected or not.
> I will investigate it further to see what's going on for the "list user" case

After looking it deeper I can confirm that this is expected behavior.
Actually, the behavior is now working as expected but it wasn't before bug 4804 (commit da49b8d0d14f1e2def06f2c22a14acb22a733153).
So, there's a behavior change between 17.11 and 18.08 but the last one is the right one (as explained in bug 4804 comment 14).

Let me try to explain a little bit more why it is this expected:

The sacctmgr manages several "entity types", like users, accounts or clusters.
Each entity type has its own "entity specs" that you can use with "where" to filter, "set" to modify or "format" to list/show.
Each entity type works as expected with *their own* entity specs.
So far, so good.

Also, as you know, there is also a special entity type, the associations.
The associations relate users, accounts and clusters (and optionally partitions).
And they have plenty of specs that you can work with:
https://slurm.schedmd.com/sacctmgr.html#lbAI

Please note that they are not actually specs for users (neither for accounts or clusters), but to associations.
Also note that each cluster have a default "root" account/assoc to handle defaults.

This design allows the flexibility to associate a user to different accounts and/or clusters having different specs on each association, as well as managing default values per cluster and account.

In your command below, the "error" is that the entity type "user" has no spec "qos":

$ sacctmgr list user where qos=gpu format=user

In this case sacctmgr could return an error. You should do "sacctmgr list assoc where qos=gpu format=user". End of story.
But "the problem" is that sacctmgr tries to be smart... and tells you this:

You requested options that are only valid when querying with the withassoc option.
Are you sure you want to continue? (You have 30 seconds to decide)
(N/y): y

So, what is "withassoc"?
It's an option available in some entities to list also their related associations.
For example, for a user it lists all its associations with different accounts, and for an account it lists all its associations with different users.

Therefore, when you ask for an entity and use withassoc (or -s) you are actually doing two queries (getting two lists):
- the list of entities of the type that you requested
- and the list of their associations
And the key point: the filter is applied *only* to the list that can be applied.

Therefore, in your case "sacctmgr list user withassoc where qos=gpu format=user" is translated to:
- list all users
- per each user list all its associations
- filter the associations to show only those that match qos=gpu (it also put in blank other values of qos)

In bug 6813 (duplicated of this) a similar thing happens with this command:

$ sacctmgr show account withassoc where user=user1

It is translated to:
- show all accounts
- per each account show all associations
- filter the associations to show only those that match user=user1

In both cases we have to use "show assoc" and then the filter will work as you expected.

I'll keep this bug open to try to improve the documentation of sacctmgr to avoid further confusion.

Hope that helps,
Albert
Comment 8 Albert Gil 2019-04-11 03:20:21 MDT
*** Ticket 6813 has been marked as a duplicate of this ticket. ***
Comment 10 Chris Samuel (NERSC) 2019-04-12 14:11:49 MDT
Hi Albert,

OK, I can understand the logic there, but I think the fact that sacctmgr uses SQL syntax misleads people into thinking that it will work in a similar way.  So I wonder if in this instance it might be better to return an error rather than do something that (to the uneducated user) looks like a bug?

All the best,
Chris
Comment 11 Albert Gil 2019-08-26 04:14:41 MDT
Hi Chris,

> OK, I can understand the logic there, but I think the fact that sacctmgr
> uses SQL syntax misleads people into thinking that it will work in a similar
> way.  So I wonder if in this instance it might be better to return an error
> rather than do something that (to the uneducated user) looks like a bug?

I agree with you.
But it's also true that it is a behavior change and some users may be relaying on it.

I'm closing this bug as infogiven, but I will also:
- track the documentation improvements into an existing documentation ticket, and let you know here once it is done.
- discuss internally if changing that behavior makes sense in future versions

Regards,
Albert
Comment 12 Albert Gil 2019-11-14 10:32:36 MST
*** Ticket 5420 has been marked as a duplicate of this ticket. ***
Comment 13 Scott Hilton 2021-08-25 11:06:36 MDT
*** Ticket 12151 has been marked as a duplicate of this ticket. ***