Created attachment 7957 [details] slurm.conf We originally suspected this was a problem with our job submit script, however, this issue persisted after disabling the lua job submit plugin. These users exist in the accounting database and have access to the account and partition they are submitting to. Their jobs are rejected with "srun: error: Unable to allocate resources: Invalid account or account/partition combination specified" Here are steps to reproduce this issue: # useradd UPPER # sacctmgr add user UPPER account=general There is no uid for user 'upper' Are you sure you want to continue? (You have 30 seconds to decide) (N/y): y Adding User(s) upper Associations = U = upper A = general C = linux Non Default Settings Would you like to commit changes? (You have 30 seconds to decide) (N/y): y # su - UPPER $ srun -p OneNode -A general hostname srun: error: Unable to allocate resources: Invalid account or account/partition combination specified
By default, slurmdbd lower-cases everything. That's why sacctmgr is printing back the lower case name and asking for you to confirm that is who you wish to add. This has been a very-long standing part of our accounting system, as documented as an intentional limitation. In 18.08, we have added Parameters=PreserveCaseUser as a config option to slurmdbd.conf which disables this case normalization. You should be able to take advantage of that just with an upgraded slurmdbd process, although we would obviously recommend upgrading the rest of the cluster to 18.08 in the near term as well. Re-tagging this as Sev3 - this is a known limitation, and is only affecting a subset of your users, and as such should not have been submitted at Sev1. Please see https://www.schedmd.com/support.php for further details outlining this. - Tim
Thanks. We'll work on updating to 18.08.
Marking this closed as a duplicate of bug 5432 which added in this new option. *** This ticket has been marked as a duplicate of ticket 5432 ***
Tim, We updated to 18.08.1 and set Parameters=PreserveCaseUser in slurmdbd.conf. After deleting the all lowercase entries from the database, the users are added and can submit jobs. Unfortunately, we found that restarting the slurmctld service causes these users to be unable to submit jobs until they are delete from and re-added to to the database. This does not affect users with capital letters in their names that were added to the database after the update. Another odd behavior is that users who were in the database before the update have their usernames displayed as all lowercase and no default account in the sacctmgr output, even after being deleted and re-added with the proper case. For example: # sacctmgr show user xxxxxxxXMICH User Def Acct Admin ---------- ---------- --------- xxxxxxxxm+ None # sacctmgr del user xxxxxxxXMICH Deleting users... xxxxxxxxmich Would you like to commit changes? (You have 30 seconds to decide) (N/y): y # sacctmgr add user xxxxxxxXMICH account=xmich Adding User(s) xxxxxxxXMICH Associations = U = xxxxxxxXM A = xmich C = msuhpcc Non Default Settings Would you like to commit changes? (You have 30 seconds to decide) (N/y): y # sacctmgr show user xxxxxxxXMICH User Def Acct Admin ---------- ---------- --------- xxxxxxxxm+ None Any ideas on what might be causing this? Thanks, Steve
I'm updating this to Sev 2 since this is preventing a large set of our users from submitting jobs.
Hey Steve, I can reproduce this and know what's going on. Let me think through it a little more and will get back to you with instructions. Thanks, Brian
Created attachment 8045 [details] sacctmgr patch ok. I'm working on patches that will solve this problem. The problem is that the previous lower case names were still in the database (just marked as deleted) when they got re-added and the name wasn't updated with the new case sensitive name. There are a couple ways of fixing the issue, 1. directly in the database or 2. using a small patch and using sacctmgr. Option 2 may be the simplest since it will handle the user table and the account coordinator table and will push the updates to the controller (so you don't need to restart the slurmctld). The attached patch will allow you to do: sacctmgr mod user <user_name> set newname=<case sensitive name> After you do this, you should be able to "sacctmgr show user", should show show the default account again and the user should be able to submit jobs again. Can you apply the patch and try using newname? Let me know how it goes.
Brian, I applied the patch and updated a user like you suggested and their default account shows in the output of sacctmgr. Their username, however, is still all lower case and, like before, they cannot submit jobs after slurmctld is restarted unless they are deleted from and re-added to the database. Also, after re-adding them, their default account is missing again. While we do want sacctmgr fixed, we am mostly interested in resolving the issue that's requiring us to delete and re-add users each time slurmctld is restarted. Thanks, Steve
Created attachment 8049 [details] dbd upper patch Hmm. Not sure why that didn't work. I've attached the patch to update the user name at addition. Can you try this patch? The slurmdbd only needs to be updated. Do you use account coordinators as well? e.g. brian@lappy:~/slurm/18.08/lappy$ sacctmgr show user upper2 User Def Acct Def WCKey Admin ---------- ---------- ---------- --------- brian@lappy:~/slurm/18.08/lappy$ sacctmgr show user upper2 withdeleted User Def Acct Def WCKey Admin ---------- ---------- ---------- --------- upper2 stuff None brian@lappy:~/slurm/18.08/lappy$ sacctmgr show assoc user=upper2 format=cluster,account,user Cluster Account User ---------- ---------- ---------- brian@lappy:~/slurm/18.08/lappy$ sacctmgr show assoc user=upper2 withdeleted user=upper2 format=cluster,account,user Cluster Account User ---------- ---------- ---------- lappy2 stuff upper2 lappy stuff upper2 brian@lappy:~/slurm/18.08/lappy$ sacctmgr add user UPPER2 account=stuff Adding User(s) UPPER2 Associations = U = UPPER2 A = stuff C = lappy U = UPPER2 A = stuff C = lappy2 Non Default Settings Would you like to commit changes? (You have 30 seconds to decide) (N/y): y brian@lappy:~/slurm/18.08/lappy$ sacctmgr show assoc user=upper2 format=cluster,account,user Cluster Account User ---------- ---------- ---------- lappy2 stuff UPPER2 lappy stuff UPPER2 brian@lappy:~/slurm/18.08/lappy$ sacctmgr show user upper2 User Def Acct Def WCKey Admin ---------- ---------- ---------- --------- UPPER2 stuff None brian@lappy:~/slurm/18.08/lappy$ scontrol show assoc flags=users user=UPPER2 Current Association Manager state User Records UserName=UPPER2(1002) DefAccount=stuff DefWckey=(null) AdminLevel=Not Set And an example of newname= brian@lappy:~/slurm/18.08/lappy$ sacctmgr show user upper2 User Def Acct Def WCKey Admin ---------- ---------- ---------- --------- upper2 stuff None brian@lappy:~/slurm/18.08/lappy$ sacctmgr show assoc user=upper2 format=cluster,account,user Cluster Account User ---------- ---------- ---------- lappy2 stuff upper2 lappy stuff upper2 brian@lappy:~/slurm/18.08/lappy$ scontrol show assoc flags=users user=UPPER2 Current Association Manager state User Records UserName=upper2(4294967294) DefAccount=stuff DefWckey=(null) AdminLevel=Not Set brian@lappy:~/slurm/18.08/lappy$ sacctmgr mod user upper2 set newname=UPPER2 Modified users... upper2 Would you like to commit changes? (You have 30 seconds to decide) (N/y): y brian@lappy:~/slurm/18.08/lappy$ sacctmgr show user upper2 User Def Acct Def WCKey Admin ---------- ---------- ---------- --------- UPPER2 stuff None brian@lappy:~/slurm/18.08/lappy$ sacctmgr show assoc user=upper2 format=cluster,account,user Cluster Account User ---------- ---------- ---------- lappy2 stuff UPPER2 lappy stuff UPPER2 brian@lappy:~/slurm/18.08/lappy$ scontrol show assoc flags=users user=UPPER2 Current Association Manager state User Records UserName=UPPER2(1002) DefAccount=stuff DefWckey=(null) AdminLevel=Not Set
Brian, I applied the dbd patch and everything looks good! After deleting and re-adding these users, they display properly in sacctmgr and they can continue to submit jobs after slurmctld is restarted. Thank you!
Awesome! Glad to hear. I'll let you know when we get the patches committed.
Hey Steve, The patches have been committed and will be available in 18.08.2: https://github.com/SchedMD/slurm/commit/1af72a1781437df5dccb264211a23962e39314dc https://github.com/SchedMD/slurm/commit/ceca378cfaacbc9b9da9294fc5d3184b292ac2f7 Additionally, if these users were coordinators they will need to get re-added as well to correct their user names. Please reopen if you have any other issues. Thanks, Brian