Ticket 4770 - Account name length issue with Slurm 17.02.9
Summary: Account name length issue with Slurm 17.02.9
Status: RESOLVED INVALID
Alias: None
Product: Slurm
Classification: Unclassified
Component: Database (show other tickets)
Version: 17.02.9
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-02-12 10:11 MST by Simon Flood
Modified: 2020-08-04 16:51 MDT (History)
1 user (show)

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Simon Flood 2018-02-12 10:11:19 MST
We've noticed an odd issue when creating new accounts which we think is related to the length of the account name.

Having recently launched a new cluster we've switched to using account names with the format <PIsurname>-<servicelevel>-<type> where servicelevel is SL[1-4] and type is CPU, GPU, or KNL. At a minimum we'd expect to have <PIsurname>-SL3-CPU and <PIsurname>-SL4-CPU then optionally <PIsurname>-SL[3-4]-GPU and/or <PIsurname>-SL[34]-KNL and possibly paying SL1 or SL2 accounts for any, or all of, CPU, GPU, and KNL.

What we've found is that if we create a PI_TESTABCDE-FGHIJK (I've replaced actual PI's surname with TESTABCDE-FGHIJK but it was that long - a double-barrelled surname) account then a TESTABCDE-FGHIJK-SL3-CPU and TESTABCDE-FGHIJK-SL4-CPU account, each with PI_TESTABCDE-FGHIJK as their parent, sacctmgr then complains when we try and create a TESTABCDE-FGHIJK-SL3-GPU account. See below for various commands and output:

[root@slurm-master ~]# sacctmgr -vi add account Name=pi_testabcde-fghijk Description="Simon Flood" Cluster=csd3 parent=uis fairshare=parent
sacctmgr: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null)
 Adding Account(s)
  pi_testabcde-fghijk
 Settings
  Description     = simon flood
  Organization    = Parent/Account Name
 Associations
  A = pi_testabc C = csd3
 Settings
  Fairshare     = parent
  Parent        = uis
[root@slurm-master ~]# sacctmgr -vi add account Name=TESTABCDE-FGHIJK-SL3-CPU GrpTRESMins=cpu=12000000 DefaultQOS=cpu2 QOS=cpu2,intr Cluster=csd3 parent=pi_testabcde-fghijk fairshare=0
sacctmgr: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null)
 Adding Account(s)
  testabcde-fghijk-sl3-cpu
 Settings
  Description     = Account Name
  Organization    = Parent/Account Name
 Associations
  A = testabcde- C = csd3
 Settings
  Fairshare     = 0
  GrpTRESMins   = cpu=12000000
  Parent        = pi_testabcde-fghijk
  QOS           = cpu2,intr
  DefQOS        = cpu2
[root@slurm-master ~]# sacctmgr -vi add account Name=TESTABCDE-FGHIJK-SL4-CPU QOS=cpu3 Cluster=csd3 parent=pi_testabcde-fghijk fairshare=0
sacctmgr: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null)
 Adding Account(s)
  testabcde-fghijk-sl4-cpu
 Settings
  Description     = Account Name
  Organization    = Parent/Account Name
 Associations
  A = testabcde- C = csd3
 Settings
  Fairshare     = 0
  Parent        = pi_testabcde-fghijk
  QOS           = cpu3
[root@slurm-master ~]# sacctmgr -vi add account Name=TESTABCDE-FGHIJK-SL3-GPU GrpTRESMins=gres/gpu=480000 DefaultQOS=gpu2 QOS=gpu2,intr Cluster=csd3 parent=pi_testabcde-fghijk fairshare=0
sacctmgr: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null)
 Adding Account(s)
  testabcde-fghijk-sl3-gpu
 Settings
  Description     = Account Name
  Organization    = Parent/Account Name
 Associations
  A = testabcde- C = csd3
 Settings
  Fairshare     = 0
  GrpTRESMins   = gres/gpu=480000
  Parent        = pi_testabcde-fghijk
  QOS           = gpu2,intr
  DefQOS        = gpu2
 Problem adding accounts: Unspecified error
[root@slurm-master ~]# sacctmgr -n show account format=Account'%-25',Description'%-30',Organization'%-20' | grep -i testabcde-fghijk
pi_testabcde-fghijk       simon flood                    uis
testabcde-fghijk-sl3-cpu  testabcde-fghijk-sl3-cpu pi_testabcde-fghijk
testabcde-fghijk-sl4-cpu  testabcde-fghijk-sl4-cpu pi_testabcde-fghijk

When we originally saw this, trying to create the TESTABCDE-FGHIJK-SL3-GPU account gave an output suggesting it was trying to create an association rather than account but that didn't happen when repeating with fake "PI surname" for this message.

The other odd thing which we suspect is related is that when trying to undo these account additions (as we created them with shorter names) is that the delete deletes the association but not the actual accounts:

[root@slurm-master ~]# sacctmgr delete account name=testabcde-fghijk-sl3-cpu cluster=csd3
 Deleting account associations...
  C = csd3       A = testabcde-fghijk-sl3-cpu of pi_testabcde-fghijk
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
[root@slurm-master ~]# sacctmgr delete account name=testabcde-fghijk-sl4-cpu cluster=csd3
 Deleting account associations...
  C = csd3       A = testabcde-fghijk-sl4-cpu of pi_testabcde-fghijk
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
[root@slurm-master ~]# sacctmgr -n show account format=Account'%-25',Description'%-30',Organization'%-20' | grep -i testabcde-fghijk
pi_testabcde-fghijk       simon flood                    uis
testabcde-fghijk-sl3-cpu  testabcde-fghijk-sl3-cpu pi_testabcde-fghijk
testabcde-fghijk-sl4-cpu  testabcde-fghijk-sl4-cpu pi_testabcde-fghijk

If we then check the MySQL table it shows the accounts still exist but not associations. We're then tidying up by deleting the accounts manually in MySQL.

Our guess is that when creating the account sacctmgr is checking and comparing partial existing account names hence thinking there's a clash. I've had a quick look at the various bits of source code for sacctmgr but with my limited C knowledge haven't spotted anything obvious.

Previously we were using a mix of <PIsurname>-<servicelevel> for CPU and <PIsurname>-<servicelevel>-GPU for GPU (we didn't have KNL) so it's possible this issue existed in an earlier version of Slurm (we are using Slurm 14.11.8 on our old cluster) but we weren't hitting it.

Our new Slurm master is running Slurm 17.02.9 on Red Hat Enterprise Linux 7.3.

If you need further information please ask.

Regards,
Simon
-- 
Simon Flood
HPC System Administrator
University of Cambridge Information Services
United Kingdom
Comment 1 Jacob Jenson 2018-02-12 10:25:21 MST
Simon,

Thank you for submitting this ticket. Our system was not able to associate your email address with a support contract. Do you know if the University of Cambridge has a Slurm support contract with SchedMD? Before this ticket can be routed to the support team for resolution we need to verify Cambridge has a support contract. 

Jacob
Comment 2 Simon Flood 2018-02-12 10:29:49 MST
Hi Jacob,

No, the University of Cambridge does not have a Slurm support contact 
with SchedMD. Sorry I wasn't aware that only those with support 
contracts could file submit bugs - I did previously post this issue to 
the Slurm-Users mailing list but have received no reply.

Simon

On 12/02/18 17:25, bugs@schedmd.com wrote:
> https://bugs.schedmd.com/show_bug.cgi?id=4770
> 
> --- Comment #1 from Jacob Jenson <jacob@schedmd.com> ---
> Simon,
> 
> Thank you for submitting this ticket. Our system was not able to associate your
> email address with a support contract. Do you know if the University of
> Cambridge has a Slurm support contract with SchedMD? Before this ticket can be
> routed to the support team for resolution we need to verify Cambridge has a
> support contract.
> 
> Jacob
>
Comment 3 John Anderson 2020-08-04 16:51:55 MDT
We are running version 18.0.8 and experienced similar behavior today. sacctmgr add account of a name similar & shorter than an existing account name. Resulting in error:

Problem adding accounts: Unspecified error

Is there a solution?