| Summary: | Users with capital letters in their usernames cannot submit jobs | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Steve Ford <fordste5> |
| Component: | slurmctld | Assignee: | Tim Wickberg <tim> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 2 - High Impact | ||
| Priority: | --- | CC: | brian |
| Version: | 17.11.7 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | MSU | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 18.08.2 | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: |
slurm.conf
sacctmgr patch dbd upper patch |
||
By default, slurmdbd lower-cases everything. That's why sacctmgr is printing back the lower case name and asking for you to confirm that is who you wish to add. This has been a very-long standing part of our accounting system, as documented as an intentional limitation. In 18.08, we have added Parameters=PreserveCaseUser as a config option to slurmdbd.conf which disables this case normalization. You should be able to take advantage of that just with an upgraded slurmdbd process, although we would obviously recommend upgrading the rest of the cluster to 18.08 in the near term as well. Re-tagging this as Sev3 - this is a known limitation, and is only affecting a subset of your users, and as such should not have been submitted at Sev1. Please see https://www.schedmd.com/support.php for further details outlining this. - Tim Thanks. We'll work on updating to 18.08. Marking this closed as a duplicate of bug 5432 which added in this new option. *** This ticket has been marked as a duplicate of ticket 5432 *** Tim,
We updated to 18.08.1 and set Parameters=PreserveCaseUser in slurmdbd.conf. After deleting the all lowercase entries from the database, the users are added and can submit jobs. Unfortunately, we found that restarting the slurmctld service causes these users to be unable to submit jobs until they are delete from and re-added to to the database. This does not affect users with capital letters in their names that were added to the database after the update.
Another odd behavior is that users who were in the database before the update have their usernames displayed as all lowercase and no default account in the sacctmgr output, even after being deleted and re-added with the proper case.
For example:
# sacctmgr show user xxxxxxxXMICH
User Def Acct Admin
---------- ---------- ---------
xxxxxxxxm+ None
# sacctmgr del user xxxxxxxXMICH
Deleting users...
xxxxxxxxmich
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
# sacctmgr add user xxxxxxxXMICH account=xmich
Adding User(s)
xxxxxxxXMICH
Associations =
U = xxxxxxxXM A = xmich C = msuhpcc
Non Default Settings
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
# sacctmgr show user xxxxxxxXMICH
User Def Acct Admin
---------- ---------- ---------
xxxxxxxxm+ None
Any ideas on what might be causing this?
Thanks,
Steve
I'm updating this to Sev 2 since this is preventing a large set of our users from submitting jobs. Hey Steve, I can reproduce this and know what's going on. Let me think through it a little more and will get back to you with instructions. Thanks, Brian Created attachment 8045 [details]
sacctmgr patch
ok. I'm working on patches that will solve this problem. The problem is that the previous lower case names were still in the database (just marked as deleted) when they got re-added and the name wasn't updated with the new case sensitive name.
There are a couple ways of fixing the issue, 1. directly in the database or 2. using a small patch and using sacctmgr. Option 2 may be the simplest since it will handle the user table and the account coordinator table and will push the updates to the controller (so you don't need to restart the slurmctld).
The attached patch will allow you to do:
sacctmgr mod user <user_name> set newname=<case sensitive name>
After you do this, you should be able to "sacctmgr show user", should show show the default account again and the user should be able to submit jobs again.
Can you apply the patch and try using newname? Let me know how it goes.
Brian, I applied the patch and updated a user like you suggested and their default account shows in the output of sacctmgr. Their username, however, is still all lower case and, like before, they cannot submit jobs after slurmctld is restarted unless they are deleted from and re-added to the database. Also, after re-adding them, their default account is missing again. While we do want sacctmgr fixed, we am mostly interested in resolving the issue that's requiring us to delete and re-add users each time slurmctld is restarted. Thanks, Steve Created attachment 8049 [details]
dbd upper patch
Hmm. Not sure why that didn't work. I've attached the patch to update the user name at addition. Can you try this patch? The slurmdbd only needs to be updated.
Do you use account coordinators as well?
e.g.
brian@lappy:~/slurm/18.08/lappy$ sacctmgr show user upper2
User Def Acct Def WCKey Admin
---------- ---------- ---------- ---------
brian@lappy:~/slurm/18.08/lappy$ sacctmgr show user upper2 withdeleted
User Def Acct Def WCKey Admin
---------- ---------- ---------- ---------
upper2 stuff None
brian@lappy:~/slurm/18.08/lappy$ sacctmgr show assoc user=upper2 format=cluster,account,user
Cluster Account User
---------- ---------- ----------
brian@lappy:~/slurm/18.08/lappy$ sacctmgr show assoc user=upper2 withdeleted user=upper2 format=cluster,account,user
Cluster Account User
---------- ---------- ----------
lappy2 stuff upper2
lappy stuff upper2
brian@lappy:~/slurm/18.08/lappy$ sacctmgr add user UPPER2 account=stuff
Adding User(s)
UPPER2
Associations =
U = UPPER2 A = stuff C = lappy
U = UPPER2 A = stuff C = lappy2
Non Default Settings
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
brian@lappy:~/slurm/18.08/lappy$ sacctmgr show assoc user=upper2 format=cluster,account,user
Cluster Account User
---------- ---------- ----------
lappy2 stuff UPPER2
lappy stuff UPPER2
brian@lappy:~/slurm/18.08/lappy$ sacctmgr show user upper2
User Def Acct Def WCKey Admin
---------- ---------- ---------- ---------
UPPER2 stuff None
brian@lappy:~/slurm/18.08/lappy$ scontrol show assoc flags=users user=UPPER2
Current Association Manager state
User Records
UserName=UPPER2(1002) DefAccount=stuff DefWckey=(null) AdminLevel=Not Set
And an example of newname=
brian@lappy:~/slurm/18.08/lappy$ sacctmgr show user upper2
User Def Acct Def WCKey Admin
---------- ---------- ---------- ---------
upper2 stuff None
brian@lappy:~/slurm/18.08/lappy$ sacctmgr show assoc user=upper2 format=cluster,account,user
Cluster Account User
---------- ---------- ----------
lappy2 stuff upper2
lappy stuff upper2
brian@lappy:~/slurm/18.08/lappy$ scontrol show assoc flags=users user=UPPER2
Current Association Manager state
User Records
UserName=upper2(4294967294) DefAccount=stuff DefWckey=(null) AdminLevel=Not Set
brian@lappy:~/slurm/18.08/lappy$ sacctmgr mod user upper2 set newname=UPPER2
Modified users...
upper2
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
brian@lappy:~/slurm/18.08/lappy$ sacctmgr show user upper2
User Def Acct Def WCKey Admin
---------- ---------- ---------- ---------
UPPER2 stuff None
brian@lappy:~/slurm/18.08/lappy$ sacctmgr show assoc user=upper2 format=cluster,account,user
Cluster Account User
---------- ---------- ----------
lappy2 stuff UPPER2
lappy stuff UPPER2
brian@lappy:~/slurm/18.08/lappy$ scontrol show assoc flags=users user=UPPER2
Current Association Manager state
User Records
UserName=UPPER2(1002) DefAccount=stuff DefWckey=(null) AdminLevel=Not Set
Brian, I applied the dbd patch and everything looks good! After deleting and re-adding these users, they display properly in sacctmgr and they can continue to submit jobs after slurmctld is restarted. Thank you! Awesome! Glad to hear. I'll let you know when we get the patches committed. Hey Steve, The patches have been committed and will be available in 18.08.2: https://github.com/SchedMD/slurm/commit/1af72a1781437df5dccb264211a23962e39314dc https://github.com/SchedMD/slurm/commit/ceca378cfaacbc9b9da9294fc5d3184b292ac2f7 Additionally, if these users were coordinators they will need to get re-added as well to correct their user names. Please reopen if you have any other issues. Thanks, Brian |
Created attachment 7957 [details] slurm.conf We originally suspected this was a problem with our job submit script, however, this issue persisted after disabling the lua job submit plugin. These users exist in the accounting database and have access to the account and partition they are submitting to. Their jobs are rejected with "srun: error: Unable to allocate resources: Invalid account or account/partition combination specified" Here are steps to reproduce this issue: # useradd UPPER # sacctmgr add user UPPER account=general There is no uid for user 'upper' Are you sure you want to continue? (You have 30 seconds to decide) (N/y): y Adding User(s) upper Associations = U = upper A = general C = linux Non Default Settings Would you like to commit changes? (You have 30 seconds to decide) (N/y): y # su - UPPER $ srun -p OneNode -A general hostname srun: error: Unable to allocate resources: Invalid account or account/partition combination specified