| Summary: | Invalid account or account/partition combination specified | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Javier Cardenas <javier.cardenas> |
| Component: | Accounting | Assignee: | Director of Support <support> |
| Status: | RESOLVED CANNOTREPRODUCE | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | nmajeran |
| Version: | 17.02.5 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Confidential | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | Screaming Hairy Armadillo |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Javier Cardenas
2018-03-12 17:30:25 MDT
Hi Javier, Could you run "sacctmgr show user swang1" and send me the output? Did you add the user with the something close to the following command: sacctmgr add user swang1 account=$account_name Hi Isaac,
jcardena@lx-chmmqrslrm03 ~$ sacctmgr show user swang1
User Def Acct Admin
---------- ---------- ---------
jcardena@lx-chmmqrslrm03 ~$ sacctmgr show assoc where account=swang1
Cluster Account User Partition Share GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin
---------- ---------- ---------- ---------- --------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- -------------
cluster swang1 1000 normal
cluster swang1 14028 1000 cpu=2300,mem+ 2300 normal
sacctmgr show user swang1 returns no values (the same is true for working accounts). I've included above other output for this user's account.
This is the portion of the user creation process that creates the user.
$sacctmgr -i create account name=$user parent=baseusers FairShare=1000
$sacctmgr -i create user account=$user name=$uid FairShare=1000
$sacctmgr -i modify user $uid set MaxJobs=$corecount GrpCpus=$corecount FairShare=1000 GrpMem=$mem GrpTres=gres/io=400
We are considering restarting slurmctld today to see if that might resolve the issue. I've already tried an scontrol reconfigure. Ok, swang1 is the account name, not the user name; my mistake. The output you sent looks fine, could you send the runuser script you used in your initial comment? We restarted the slurmctld service and the 3 newly created accounts began to work. I created a new account for another user requesting access and that account also works now. The runuser command wasn't part of any script. It was just trying to srun hostname in our veryshort partition. runuser -c "/logs/slurm/bin/srun -p veryshort hostname" swang1 The only thing I can think of is that we did decommission an old ldap server in the environment a couple weeks ago. However we see from the getent output in my initial comment that the controller was able to resolve the user so I dont think it was the ldap work unless maybe the slurmctld was looking at the old ldap server which has slapd disabled now. Although, I imagine even existing users would fail unless the controller caches them permanently... Ok, well I'll close this ticket for now, but should the problem resurface, please reopen it. Regards |