We've noticed an odd issue when creating new accounts which we think is related to the length of the account name. Having recently launched a new cluster we've switched to using account names with the format <PIsurname>-<servicelevel>-<type> where servicelevel is SL[1-4] and type is CPU, GPU, or KNL. At a minimum we'd expect to have <PIsurname>-SL3-CPU and <PIsurname>-SL4-CPU then optionally <PIsurname>-SL[3-4]-GPU and/or <PIsurname>-SL[34]-KNL and possibly paying SL1 or SL2 accounts for any, or all of, CPU, GPU, and KNL. What we've found is that if we create a PI_TESTABCDE-FGHIJK (I've replaced actual PI's surname with TESTABCDE-FGHIJK but it was that long - a double-barrelled surname) account then a TESTABCDE-FGHIJK-SL3-CPU and TESTABCDE-FGHIJK-SL4-CPU account, each with PI_TESTABCDE-FGHIJK as their parent, sacctmgr then complains when we try and create a TESTABCDE-FGHIJK-SL3-GPU account. See below for various commands and output: [root@slurm-master ~]# sacctmgr -vi add account Name=pi_testabcde-fghijk Description="Simon Flood" Cluster=csd3 parent=uis fairshare=parent sacctmgr: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null) Adding Account(s) pi_testabcde-fghijk Settings Description = simon flood Organization = Parent/Account Name Associations A = pi_testabc C = csd3 Settings Fairshare = parent Parent = uis [root@slurm-master ~]# sacctmgr -vi add account Name=TESTABCDE-FGHIJK-SL3-CPU GrpTRESMins=cpu=12000000 DefaultQOS=cpu2 QOS=cpu2,intr Cluster=csd3 parent=pi_testabcde-fghijk fairshare=0 sacctmgr: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null) Adding Account(s) testabcde-fghijk-sl3-cpu Settings Description = Account Name Organization = Parent/Account Name Associations A = testabcde- C = csd3 Settings Fairshare = 0 GrpTRESMins = cpu=12000000 Parent = pi_testabcde-fghijk QOS = cpu2,intr DefQOS = cpu2 [root@slurm-master ~]# sacctmgr -vi add account Name=TESTABCDE-FGHIJK-SL4-CPU QOS=cpu3 Cluster=csd3 parent=pi_testabcde-fghijk fairshare=0 sacctmgr: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null) Adding Account(s) testabcde-fghijk-sl4-cpu Settings Description = Account Name Organization = Parent/Account Name Associations A = testabcde- C = csd3 Settings Fairshare = 0 Parent = pi_testabcde-fghijk QOS = cpu3 [root@slurm-master ~]# sacctmgr -vi add account Name=TESTABCDE-FGHIJK-SL3-GPU GrpTRESMins=gres/gpu=480000 DefaultQOS=gpu2 QOS=gpu2,intr Cluster=csd3 parent=pi_testabcde-fghijk fairshare=0 sacctmgr: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null) Adding Account(s) testabcde-fghijk-sl3-gpu Settings Description = Account Name Organization = Parent/Account Name Associations A = testabcde- C = csd3 Settings Fairshare = 0 GrpTRESMins = gres/gpu=480000 Parent = pi_testabcde-fghijk QOS = gpu2,intr DefQOS = gpu2 Problem adding accounts: Unspecified error [root@slurm-master ~]# sacctmgr -n show account format=Account'%-25',Description'%-30',Organization'%-20' | grep -i testabcde-fghijk pi_testabcde-fghijk simon flood uis testabcde-fghijk-sl3-cpu testabcde-fghijk-sl3-cpu pi_testabcde-fghijk testabcde-fghijk-sl4-cpu testabcde-fghijk-sl4-cpu pi_testabcde-fghijk When we originally saw this, trying to create the TESTABCDE-FGHIJK-SL3-GPU account gave an output suggesting it was trying to create an association rather than account but that didn't happen when repeating with fake "PI surname" for this message. The other odd thing which we suspect is related is that when trying to undo these account additions (as we created them with shorter names) is that the delete deletes the association but not the actual accounts: [root@slurm-master ~]# sacctmgr delete account name=testabcde-fghijk-sl3-cpu cluster=csd3 Deleting account associations... C = csd3 A = testabcde-fghijk-sl3-cpu of pi_testabcde-fghijk Would you like to commit changes? (You have 30 seconds to decide) (N/y): y [root@slurm-master ~]# sacctmgr delete account name=testabcde-fghijk-sl4-cpu cluster=csd3 Deleting account associations... C = csd3 A = testabcde-fghijk-sl4-cpu of pi_testabcde-fghijk Would you like to commit changes? (You have 30 seconds to decide) (N/y): y [root@slurm-master ~]# sacctmgr -n show account format=Account'%-25',Description'%-30',Organization'%-20' | grep -i testabcde-fghijk pi_testabcde-fghijk simon flood uis testabcde-fghijk-sl3-cpu testabcde-fghijk-sl3-cpu pi_testabcde-fghijk testabcde-fghijk-sl4-cpu testabcde-fghijk-sl4-cpu pi_testabcde-fghijk If we then check the MySQL table it shows the accounts still exist but not associations. We're then tidying up by deleting the accounts manually in MySQL. Our guess is that when creating the account sacctmgr is checking and comparing partial existing account names hence thinking there's a clash. I've had a quick look at the various bits of source code for sacctmgr but with my limited C knowledge haven't spotted anything obvious. Previously we were using a mix of <PIsurname>-<servicelevel> for CPU and <PIsurname>-<servicelevel>-GPU for GPU (we didn't have KNL) so it's possible this issue existed in an earlier version of Slurm (we are using Slurm 14.11.8 on our old cluster) but we weren't hitting it. Our new Slurm master is running Slurm 17.02.9 on Red Hat Enterprise Linux 7.3. If you need further information please ask. Regards, Simon -- Simon Flood HPC System Administrator University of Cambridge Information Services United Kingdom
Simon, Thank you for submitting this ticket. Our system was not able to associate your email address with a support contract. Do you know if the University of Cambridge has a Slurm support contract with SchedMD? Before this ticket can be routed to the support team for resolution we need to verify Cambridge has a support contract. Jacob
Hi Jacob, No, the University of Cambridge does not have a Slurm support contact with SchedMD. Sorry I wasn't aware that only those with support contracts could file submit bugs - I did previously post this issue to the Slurm-Users mailing list but have received no reply. Simon On 12/02/18 17:25, bugs@schedmd.com wrote: > https://bugs.schedmd.com/show_bug.cgi?id=4770 > > --- Comment #1 from Jacob Jenson <jacob@schedmd.com> --- > Simon, > > Thank you for submitting this ticket. Our system was not able to associate your > email address with a support contract. Do you know if the University of > Cambridge has a Slurm support contract with SchedMD? Before this ticket can be > routed to the support team for resolution we need to verify Cambridge has a > support contract. > > Jacob >
We are running version 18.0.8 and experienced similar behavior today. sacctmgr add account of a name similar & shorter than an existing account name. Resulting in error: Problem adding accounts: Unspecified error Is there a solution?