Ticket 1897

Summary: Possible SQL performance issue when 'sacctmgr delete account'
Product: Slurm Reporter: Kai Xi <kai.xi>
Component: slurmdbdAssignee: Jacob Jenson <jacob>
Status: RESOLVED TIMEDOUT QA Contact:
Severity: 6 - No support contract    
Priority: ---    
Version: 14.11.4   
Hardware: Linux   
OS: Linux   
Site: -Other- Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Kai Xi 2015-08-30 16:12:03 MDT
Dear Slurm developers:

We experienced the issue of Slurm controller can't delete an account. When tried
  sacctmgr delete account pmosp
It's either hanging for over 10 mins doing nothing or reported like
  Database is busy or waiting for lock from other user  

Then we took a look of the mysql and find a SQL command hanging there (cluster name is m2-m2):
  select t0.id_assoc from `m2-m2_job_table` as t0, `m2-m2_assoc_table` as t1, `m2-m2_assoc_table` as t2 where t1.lft between t2.lft and t2.rgt && (t2.acct='pmosp') and t0.id_assoc=t1.id_assoc limit 1;

It seems to be Three-way join: t0-t2-t3. In our production DB, size of job_table: 6000000 entries, size of id_assoc table: 2000. 

We managed to run this query on a test server with no user job in, It takes 2 min 49.02 sec. We suspected the table joining is not optimized. We rewrote an near-equivalent version which does Two Step Join- t2-t3 first, then t0-(t2-t3), as follows:
  select distinct t0.id_assoc from `m2-m2_job_table` as t0 where exists (select distinct t1.id_assoc from `m2-m2_assoc_table` as t1, `m2-m2_assoc_table` as t2 where t1.lft between t2.lft and t2.rgt && (t2.acct='pmosp')) limit 1;

It takes 3.04 sec.

Could you please have a look and see if it's a bug or not? And how many jobs can Slurm handle in theory? Thanks.

Regards,


Dr Kai Xi
HPC Consultant
Multi-modal Australian ScienceS Imaging and Visualisation Environment (www.massive.org.au)
Ph: 9902 4845 Mob: 0433 658 618
Comment 1 Kai Xi 2015-09-06 11:39:43 MDT
Dear Jacob, Brian and Jette

Can I ask if there is any updates? We can provide a DB test case if you want. Thanks!

Regards,
Kai
Comment 2 Jacob Jenson 2017-12-13 10:40:27 MST
Kai Xi,

We are timing out this ticket due to its age. If this is still an issue please let me know. 

Jacob