Ticket 1554 - setting qos for a user
Summary: setting qos for a user
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other tickets)
Version: 14.03.0
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Brian Christiansen
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2015-03-22 13:53 MDT by Gareth
Modified: 2015-09-01 11:02 MDT (History)
4 users (show)

See Also:
Site: CSIRO
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Gareth 2015-03-22 13:53:23 MDT
To workaround current fairshare issues (http://bugs.schedmd.com/show_bug.cgi?id=1531) I set up a qos tmpprio and attempted to apply it to particular users.

This might be sufficient info for you to check that I am approaching this in the right way:

> sacctmgr show qos
      Name   Priority  GraceTime    Preempt PreemptMode                                    Flags UsageThres UsageFactor  GrpCPUs  GrpCPUMins GrpCPURunMins GrpJobs  GrpMem GrpNodes GrpSubmit     GrpWall  MaxCPUs  MaxCPUMins MaxNodes     MaxWall MaxCPUsPU MaxJobsPU MaxNodesPU MaxSubmitPU 
---------- ---------- ---------- ---------- ----------- ---------------------------------------- ---------- ----------- -------- ----------- ------------- ------- ------- -------- --------- ----------- -------- ----------- -------- ----------- --------- --------- ---------- ----------- 
    normal          0   00:00:00                cluster                                                        1.000000                                                                                                                                                                        
   express       1000   00:00:00                cluster                                                        1.000000                                                                                                                    06:00:00                   4                        
   tmpprio       1000   00:00:00                cluster                                                        1.000000                                                                                                                                                                        


> sacctmgr show assoc where user=sa-Biotools_interpro
   Cluster    Account       User  Partition     Share GrpJobs GrpNodes  GrpCPUs  GrpMem GrpSubmit     GrpWall  GrpCPUMins MaxJobs MaxNodes  MaxCPUs MaxSubmit     MaxWall  MaxCPUMins                  QOS   Def QOS GrpCPURunMins 
---------- ---------- ---------- ---------- --------- ------- -------- -------- ------- --------- ----------- ----------- ------- -------- -------- --------- ----------- ----------- -------------------- --------- ------------- 
slurm_clu+       root sa-biotoo+                    8                                                                                                                                 express,normal,tmpp+   tmpprio               
                                                 

New jobs from user sa-Biotools_interpro do not get the tmpprio qos (unless they explicitly request it).  I'm wanting and expecting them to get it by default - that would seem consistent with the sacctmgr man page.

It may be noteworthy that sacctmgr reports username in a case insensitive manner.

Is this a bug or a usage error or misunderstanding?

thanks,

Gareth
Comment 1 Brian Christiansen 2015-03-24 09:52:25 MDT
Does it work after a restart of slurmctld? There was a bug fixed in 14.11.5 that prevented the default qos from being set correctly until after a restart. Maybe you're hitting the bug?

https://github.com/SchedMD/slurm/commit/06ea19c47dd982851307fcaaa9ba48a2549c3b85
Comment 2 Gareth 2015-03-24 14:40:00 MDT
restarting slurmctld does not change the behaviour.
Comment 3 Brian Christiansen 2015-03-25 11:44:47 MDT
What OS/distro are you using? When I try to add the user, "sa-Biotools_interpro", to the database it complains that it can't find the user.


brian@knc:~/slurm/14.11/knc$ sacctmgr add user sa-Biotools_interpro account=normal
There is no uid for user 'sa-biotools_interpro'
Are you sure you want to continue? (You have 30 seconds to decide)

brian@knc:~/slurm/14.11/knc$ id sa-Biotools_interpro
uid=7559(sa-Biotools_interpro) gid=7560(sa-Biotools_interpro) groups=7560(sa-Biotools_interpro)
Comment 4 James Powell 2015-03-25 12:05:28 MDT
~> cat /etc/SuSE-release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 3


(In reply to Brian Christiansen from comment #3)
> What OS/distro are you using? When I try to add the user,
> "sa-Biotools_interpro", to the database it complains that it can't find the
> user.
> 
> 
> brian@knc:~/slurm/14.11/knc$ sacctmgr add user sa-Biotools_interpro
> account=normal
> There is no uid for user 'sa-biotools_interpro'
> Are you sure you want to continue? (You have 30 seconds to decide)
> 
> brian@knc:~/slurm/14.11/knc$ id sa-Biotools_interpro
> uid=7559(sa-Biotools_interpro) gid=7560(sa-Biotools_interpro)
> groups=7560(sa-Biotools_interpro)
Comment 5 Brian Christiansen 2015-03-25 12:50:34 MDT
Does it work for you with a "normal user" (ie. No dashes or underscores)? It worked for me for a normal user. I'll try it on suse box.
Comment 6 James Powell 2015-03-25 16:45:56 MDT
No issues adding/deleting users with or without '-' or '_'

e.g.

~> sacctmgr add user sa-test_account account=root
 Adding User(s)
  sa-test_account
...

~> sacctmgr add user sa-testaccount account=root
 Adding User(s)
  sa-testaccount


(In reply to Brian Christiansen from comment #5)
> Does it work for you with a "normal user" (ie. No dashes or underscores)? It
> worked for me for a normal user. I'll try it on suse box.
Comment 7 Gareth 2015-03-26 18:27:57 MDT
(In reply to James Powell from comment #6)

I *think* James is off the mark.  I'll try and apply the qos to another user with a simple name.  Unfortunately I'm bogged down in tender processes and can't get to this for a few days.

Gareth
Comment 8 Gareth 2015-03-29 09:37:20 MDT
It does work for a user with a simpler name
 sacctmgr modify user user=wil240 set qos+=tmpprio defaultqos=tmpprio

slurmctld was not restarted.

Still no change for Biotools_interpro

James, can you please repeat the test for (test) users with capital letters, _ and -?
Comment 9 Gareth 2015-03-29 17:56:46 MDT
(In reply to Gareth from comment #8)
> It does work for a user with a simpler name
>  sacctmgr modify user user=wil240 set qos+=tmpprio defaultqos=tmpprio
> 
> slurmctld was not restarted.
> 
> Still no change for Biotools_interpro
> 
> James, can you please repeat the test for (test) users with capital letters,
> _ and -?

I tried for three more users - and it looks like capitalization is a problem.
sa_testaccount and sa-testaccount and sa-test_account are OK, sa-Biotools_test is not.

The name length is another variable.

Gareth
Comment 10 Brian Christiansen 2015-03-31 11:25:35 MDT
I was able to confirm that usernames with capitalization doesn't work in slurm. sacctmgr lowercases the names before sending them to the dbd.

The lowercasing of names was done because of case sensitive issues with Postgres. Now that slurm no longer supports Postgres, I'm looking into the possibility of not lowercasing the names. I'll let you know what I find.
Comment 11 Gareth 2015-03-31 23:12:55 MDT
(In reply to Brian Christiansen from comment #10)
> I was able to confirm that usernames with capitalization doesn't work in
> slurm. sacctmgr lowercases the names before sending them to the dbd.
> 
> The lowercasing of names was done because of case sensitive issues with
> Postgres. Now that slurm no longer supports Postgres, I'm looking into the
> possibility of not lowercasing the names. I'll let you know what I find.

We changed names to lower case and the situation did not get better  - though we have not altered the sacctmgr setup or restarted any slurm processes.

I think we should try to remove and recreate the association entries to see if that helps but I've run short on time to try now.

Gareth
Comment 12 Gareth 2015-04-01 22:12:03 MDT
> I think we should try to remove and recreate the association entries to see
> if that helps but I've run short on time to try now.
> 
> Gareth

I removed and recreated one association entry and it is now accumulating usage.  This is good as it means the fairshare priority factor now applies.  Unfortunately the default qos did not get better, but perhaps restarting slurmdbd will help (not that it mattered when I tested before - but that was not recreating an entry).  I don't have the power to do that and I'm off on leave for a week.

So... I should probably be updating #1531 - and the tickets should probably be merged.

I'll recreate the other associations.

Assuming the fairshare issue is fixed (with our workaround to use lower case), I don't need the extra qos (which was to work around the fairshare issue) so I'll lose interest in the bug.

Thanks for your help in confirming the bug and committing to look into fixing it.

Gareth
Comment 13 Gareth 2015-04-01 22:37:40 MDT
(In reply to Gareth from comment #12)
-snip-

> Unfortunately the default qos did not get better, ...

-snip-

I realised I was mistaken and did not reapply the default qos settings properly.

Testing further, moving to lower case has fixed all the problems. I think I did need to recreate the association entries to get the fairshare info accumulating.

regards,

Gareth
Comment 14 Brian Christiansen 2015-04-02 18:00:41 MDT
I'm glad it's working for you. I'll let you know when I'm done with handling case sensitive names.

As a side note, I'm guessing that you don't have AccountingStorageEnforce=associations. If you did, the users would have not been able to submit jobs because they wouldn't have matched an association. It may be something to look into if you want to limit who's running on the cluster.
Comment 15 Brian Christiansen 2015-04-06 11:25:46 MDT
After investigation we've decided to only support lowercase usernames. MySQL is also case insensitive by default. To avoid issues of having to change the queries tables, or have the user configure mysql a certain way, we feel it's best to leave it the way it is.

Let us know if you have any questions.
Comment 16 Gareth 2015-04-13 18:18:51 MDT
(In reply to Brian Christiansen from comment #15)
> After investigation we've decided to only support lowercase usernames. MySQL
> is also case insensitive by default. To avoid issues of having to change the
> queries tables, or have the user configure mysql a certain way, we feel it's
> best to leave it the way it is.
> 
> Let us know if you have any questions.

Thanks.  I'm happy enough for this to be resolved, but please document the limitation.  Where to document that so it can be easily found is another matter...

regards,

Gareth
Comment 17 Felix.Rauscher 2015-05-27 23:50:11 MDT
Hi,

unfortunately, our accounts are mixed case (Givenname.Surname), what I cannot change. However, our account-names are still unique if e.g. matched to lowercase. Therefore it would help, if at job submission a case-insensitive match would be made. I think, that currently (slurm 14.11.7) the match is made case-sensitive.

If I am not mistaken, this means, that when you have unix-account names with upper case letters, you cannot force accounting associations or even use default associations.

A patch in this respect would be very useful.

Thanks

Felix
Comment 18 Brian Christiansen 2015-09-01 11:02:40 MDT
I'm closing this ticket as CSIRO's issue was resolved. Felix, you can submit a feature request if you are still interested in this.

Thanks,
Brian