Hi there, We've got a QOS called GPU for a select band of users who have access to a small set of GPU nodes. I was asked how to query Slurm to see who is in the QOS and said: sacctmgr list user where qos=gpu format=user but that caused consternation as that reported 7,000+ users and not the expected number of around 100. This is because Slurm seems to list all users regardless of QOS, despite the where clause. As a workaround we've found that if you ask for the user and qos list then the QOS list is blank for users who are not in the QOS and populated for those who are so we can use awk to pull that list of users out. This is probably a duplicate of bug #5420 from 9 months ago but as they don't have a support contract it's not being worked on. All the best, Chris
Hi Chris, I've replicated the issue but I'm not certain if it expected or not. First of all please note that you can achieve what you want just changing "user" by "assoc": $ sacctmgr show assoc format=user,account,qos User Account QOS ---------- ---------- -------------------- root normal root root normal developme+ normal agil developme+ normal bob developme+ normal,qos6702 sue developme+ normal external normal joe external normal $ sacctmgr show assoc format=user,account,qos where qos=qos6702 User Account QOS ---------- ---------- -------------------- bob developme+ normal,qos6702 $ sacctmgr show assoc format=user,account,qos where account=external User Account QOS ---------- ---------- -------------------- external normal joe external normal In the documentation you can see that SPECIFICATIONS FOR USERS and LIST/SHOW USER FORMAT OPTIONS doesn't include QOS, but LIST/SHOW ASSOCIATION FORMAT OPTIONS and GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES does: https://slurm.schedmd.com/sacctmgr.html But it looks strange to me. Also, it's not only happening when trying to filter by QoS, but it looks like with Account (and any other filter) the same behavior is happening: - all users are listed - the specific field filtered is correctly shown only for the right users, and blank for the rest $ sacctmgr list user withassoc format=user,account,qos User Account QOS ---------- ---------- -------------------- agil developme+ normal bob developme+ normal,qos6702 joe external normal root root normal sue developme+ normal $ sacctmgr list user withassoc format=user,account,qos where account=external User Account QOS ---------- ---------- -------------------- agil bob joe external normal root sue $ sacctmgr list user withassoc format=user,account,qos where qos=qos6702 User Account QOS ---------- ---------- -------------------- agil bob developme+ normal,qos6702 joe root sue I will investigate it further to see what's going on for the "list user" case, but I hope that "list/show assoc" helps in your use case in the meantime. Does it? Albert
Hi Chris, > I've replicated the issue but I'm not certain if it expected or not. > I will investigate it further to see what's going on for the "list user" case After looking it deeper I can confirm that this is expected behavior. Actually, the behavior is now working as expected but it wasn't before bug 4804 (commit da49b8d0d14f1e2def06f2c22a14acb22a733153). So, there's a behavior change between 17.11 and 18.08 but the last one is the right one (as explained in bug 4804 comment 14). Let me try to explain a little bit more why it is this expected: The sacctmgr manages several "entity types", like users, accounts or clusters. Each entity type has its own "entity specs" that you can use with "where" to filter, "set" to modify or "format" to list/show. Each entity type works as expected with *their own* entity specs. So far, so good. Also, as you know, there is also a special entity type, the associations. The associations relate users, accounts and clusters (and optionally partitions). And they have plenty of specs that you can work with: https://slurm.schedmd.com/sacctmgr.html#lbAI Please note that they are not actually specs for users (neither for accounts or clusters), but to associations. Also note that each cluster have a default "root" account/assoc to handle defaults. This design allows the flexibility to associate a user to different accounts and/or clusters having different specs on each association, as well as managing default values per cluster and account. In your command below, the "error" is that the entity type "user" has no spec "qos": $ sacctmgr list user where qos=gpu format=user In this case sacctmgr could return an error. You should do "sacctmgr list assoc where qos=gpu format=user". End of story. But "the problem" is that sacctmgr tries to be smart... and tells you this: You requested options that are only valid when querying with the withassoc option. Are you sure you want to continue? (You have 30 seconds to decide) (N/y): y So, what is "withassoc"? It's an option available in some entities to list also their related associations. For example, for a user it lists all its associations with different accounts, and for an account it lists all its associations with different users. Therefore, when you ask for an entity and use withassoc (or -s) you are actually doing two queries (getting two lists): - the list of entities of the type that you requested - and the list of their associations And the key point: the filter is applied *only* to the list that can be applied. Therefore, in your case "sacctmgr list user withassoc where qos=gpu format=user" is translated to: - list all users - per each user list all its associations - filter the associations to show only those that match qos=gpu (it also put in blank other values of qos) In bug 6813 (duplicated of this) a similar thing happens with this command: $ sacctmgr show account withassoc where user=user1 It is translated to: - show all accounts - per each account show all associations - filter the associations to show only those that match user=user1 In both cases we have to use "show assoc" and then the filter will work as you expected. I'll keep this bug open to try to improve the documentation of sacctmgr to avoid further confusion. Hope that helps, Albert
*** Ticket 6813 has been marked as a duplicate of this ticket. ***
Hi Albert, OK, I can understand the logic there, but I think the fact that sacctmgr uses SQL syntax misleads people into thinking that it will work in a similar way. So I wonder if in this instance it might be better to return an error rather than do something that (to the uneducated user) looks like a bug? All the best, Chris
Hi Chris, > OK, I can understand the logic there, but I think the fact that sacctmgr > uses SQL syntax misleads people into thinking that it will work in a similar > way. So I wonder if in this instance it might be better to return an error > rather than do something that (to the uneducated user) looks like a bug? I agree with you. But it's also true that it is a behavior change and some users may be relaying on it. I'm closing this bug as infogiven, but I will also: - track the documentation improvements into an existing documentation ticket, and let you know here once it is done. - discuss internally if changing that behavior makes sense in future versions Regards, Albert
*** Ticket 5420 has been marked as a duplicate of this ticket. ***
*** Ticket 12151 has been marked as a duplicate of this ticket. ***