Ticket 15739

Summary: select query from "sacctmgr -P -n show assoc where qos=deleted format=user,account" takes a long time
Product: Slurm Reporter: Chris Samuel (NERSC) <csamuel>
Component: DatabaseAssignee: Carlos Tripiana Montes <tripiana>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: dmjacobsen, tripiana
Version: 22.05.6   
Hardware: Linux   
OS: Linux   
Site: NERSC Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Chris Samuel (NERSC) 2023-01-05 09:10:17 MST
Hi there,

Our system for making Slurm's associations conform with our management system takes a long time to run, and when we were working through bug #15270 where the 22.05 DB upgrade broke I found that the reason was a particular SELECT statement was taking over 16 minutes to complete. :-(

The "sacctmgr" command that gets run is:

sacctmgr -P -n show assoc where qos=deleted format=user,account

That results in this long running query (I just kicked it off now):

*************************** 8. row ***************************
           Id: 341165
         User: slurmdbd
         Host: 10-35-192-3.pxc-db-haproxy-replicas.nersc.svc.cluster.local:45296
           db: perlmutter_slurm_acct_db
      Command: Query
         Time: 238
        State: executing
         Info: select distinct t1.id_assoc, t1.lft, t1.rgt, t1.user, t1.acct, t1.`partition`, t1.shares, t1.grp_tres_mins, t1.grp_tres_run_mins, t1.grp_tres, t1.grp_jobs, t1.grp_jobs_accrue, t1.grp_submit_jobs, t1.grp_wall, t1.max_tres_mins_pj, t1.max_tres_run_mins, t1.max_tres_pj, t1.max_tres_pn, t1.max_jobs, t1.max_jobs_accrue, t1.min_prio_thresh, t1.max_submit_jobs, t1.max_wall_pj, t1.parent_acct, t1.priority, t1.def_qos_id, t1.qos, t1.delta_qos, t1.is_def, t1.deleted from "perlmutter_assoc_table" as t1, "perlmutter_assoc_table" as t2 where (t1.lft between t2.lft and t2.rgt) && ((t2.qos like '%,25' || t2.qos like '%,25,%' || t2.delta_qos like '%,+25' || t2.delta_qos like '%,+25,%')) && t1.deleted=0 order by lft
    Rows_sent: 0
Rows_examined: 0
8 rows in set (0.00 sec)


Back when I ran this on 25th October it reported:

7655 rows in set, 5 warnings (16 min 4.058 sec)

At the time the tables looked like:

MySQL [perlmutter_slurm_acct_db]> describe table perlmutter_assoc_table;                                                                                                             +----+-------------+------------------------+------------+------+---------------+------+---------+------+-------+----------+-------+
| id | select_type | table                  | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra |
+----+-------------+------------------------+------------+------+---------------+------+---------+------+-------+----------+-------+
|  1 | SIMPLE      | perlmutter_assoc_table | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 41107 |   100.00 | NULL  |
+----+-------------+------------------------+------------+------+---------------+------+---------+------+-------+----------+-------+
1 row in set, 1 warning (0.002 sec)


MySQL [perlmutter_slurm_acct_db]> select count(*) from perlmutter_assoc_table; 
+----------+
| count(*) |
+----------+
|    40243 |
+----------+
1 row in set (0.013 sec)


I also did an explain of the query here:

MySQL [perlmutter_slurm_acct_db]> explain select distinct t1.id_assoc, t1.lft, t1.rgt, t1.user, t1.acct, t1.`partition`, t1.shares, t1.grp_tres_mins, t1.grp_tres_run_mins, t1.grp_tres, t1.grp_jobs, t1.grp_jobs_accrue, t1.grp_submit_jobs, t1.grp_wall, t1.max_tres_mins_pj, t1.max_tres_run_mins, t1.max_tres_pj, t1.max_tres_pn, t1.max_jobs, t1.max_jobs_accrue, t1.min_prio_thresh, t1.max_submit_jobs, t1.max_wall_pj, t1.parent_acct, t1.priority, t1.def_qos_id, t1.qos, t1.delta_qos, t1.is_def, t1.deleted from perlmutter_assoc_table as t1, perlmutter_assoc_table as t2 where (t1.lft between t2.lft and t2.rgt) && ((t2.qos like '%,701' || t2.qos like '%,701,%' || t2.delta_qos like '%,+701' || t2.delta_qos like '%,+701,%')) && t1.deleted=0 && (t2.user!='') order by lft;
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                                                    |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------------------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | lft           | NULL | NULL    | NULL | 41107 |    10.00 | Using where; Using temporary; Using filesort             |
|  1 | SIMPLE      | t2    | NULL       | ALL  | udex,lft      | NULL | NULL    | NULL | 41107 |     3.76 | Range checked for each record (index map: 0x6); Distinct |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------------------------------------------------+
2 rows in set, 7 warnings (0.003 sec)

Whilst this is mostly invisible to us as it runs automatically inside a container it does mean any urgent checks or changes do have this annoying long wait up front.

Any ideas?

All the best,
Chris
Comment 2 Carlos Tripiana Montes 2023-01-13 02:55:05 MST
Hey Chris,

I see you posted:

  MySQL [perlmutter_slurm_acct_db]> describe table perlmutter_assoc_table;
  [...]

But this is not the command syntax I was expecting, nor its output afaik. It should be:

  MySQL [perlmutter_slurm_acct_db]> desc perlmutter_assoc_table;

and print something like:

MariaDB [slurm_acct_db_2205_14518]> desc cluster_assoc_table;
+-------------------+---------------------+------+-----+---------+----------------+
| Field             | Type                | Null | Key | Default | Extra          |
+-------------------+---------------------+------+-----+---------+----------------+
| creation_time     | bigint(20) unsigned | NO   |     | NULL    |                |
| mod_time          | bigint(20) unsigned | NO   |     | 0       |                |
| deleted           | tinyint(4)          | NO   |     | 0       |                |
| is_def            | tinyint(4)          | NO   |     | 0       |                |
| id_assoc          | int(10) unsigned    | NO   | PRI | NULL    | auto_increment |
| user              | tinytext            | NO   | MUL | ''      |                |
| acct              | tinytext            | NO   | MUL | NULL    |                |
| partition         | tinytext            | NO   |     | ''      |                |
| parent_acct       | tinytext            | NO   |     | ''      |                |
| lft               | int(11)             | NO   | MUL | NULL    |                |
| rgt               | int(11)             | NO   |     | NULL    |                |
| shares            | int(11)             | NO   |     | 1       |                |
| max_jobs          | int(11)             | YES  |     | NULL    |                |
| max_jobs_accrue   | int(11)             | YES  |     | NULL    |                |
| min_prio_thresh   | int(11)             | YES  |     | NULL    |                |
| max_submit_jobs   | int(11)             | YES  |     | NULL    |                |
| max_tres_pj       | text                | NO   |     | ''      |                |
| max_tres_pn       | text                | NO   |     | ''      |                |
| max_tres_mins_pj  | text                | NO   |     | ''      |                |
| max_tres_run_mins | text                | NO   |     | ''      |                |
| max_wall_pj       | int(11)             | YES  |     | NULL    |                |
| grp_jobs          | int(11)             | YES  |     | NULL    |                |
| grp_jobs_accrue   | int(11)             | YES  |     | NULL    |                |
| grp_submit_jobs   | int(11)             | YES  |     | NULL    |                |
| grp_tres          | text                | NO   |     | ''      |                |
| grp_tres_mins     | text                | NO   |     | ''      |                |
| grp_tres_run_mins | text                | NO   |     | ''      |                |
| grp_wall          | int(11)             | YES  |     | NULL    |                |
| priority          | int(10) unsigned    | YES  |     | NULL    |                |
| def_qos_id        | int(11)             | YES  |     | NULL    |                |
| qos               | blob                | NO   |     | ''      |                |
| delta_qos         | blob                | NO   |     | ''      |                |
+-------------------+---------------------+------+-----+---------+----------------+
32 rows in set (0,001 sec)

Would you mind to post the same output here, and see if it matches mine?

Thanks!
Carlos.
Comment 3 Chris Samuel (NERSC) 2023-01-13 11:34:19 MST
Hi Carlos,

Sorry about that, I'm not much of a DB expert!

How does this look?

mysql> desc perlmutter_assoc_table;
--------------
desc perlmutter_assoc_table
--------------

+-------------------+-----------------+------+-----+---------+----------------+
| Field             | Type            | Null | Key | Default | Extra          |
+-------------------+-----------------+------+-----+---------+----------------+
| creation_time     | bigint unsigned | NO   |     | NULL    |                |
| mod_time          | bigint unsigned | NO   |     | 0       |                |
| deleted           | tinyint         | NO   |     | 0       |                |
| is_def            | tinyint         | NO   |     | 0       |                |
| id_assoc          | int unsigned    | NO   | PRI | NULL    | auto_increment |
| user              | tinytext        | NO   | MUL | NULL    |                |
| acct              | tinytext        | NO   | MUL | NULL    |                |
| partition         | tinytext        | NO   |     | NULL    |                |
| parent_acct       | tinytext        | NO   |     | NULL    |                |
| lft               | int             | NO   | MUL | NULL    |                |
| rgt               | int             | NO   |     | NULL    |                |
| shares            | int             | NO   |     | 1       |                |
| max_jobs          | int             | YES  |     | NULL    |                |
| max_jobs_accrue   | int             | YES  |     | NULL    |                |
| min_prio_thresh   | int             | YES  |     | NULL    |                |
| max_submit_jobs   | int             | YES  |     | NULL    |                |
| max_tres_pj       | text            | NO   |     | NULL    |                |
| max_tres_pn       | text            | NO   |     | NULL    |                |
| max_tres_mins_pj  | text            | NO   |     | NULL    |                |
| max_tres_run_mins | text            | NO   |     | NULL    |                |
| max_wall_pj       | int             | YES  |     | NULL    |                |
| grp_jobs          | int             | YES  |     | NULL    |                |
| grp_jobs_accrue   | int             | YES  |     | NULL    |                |
| grp_submit_jobs   | int             | YES  |     | NULL    |                |
| grp_tres          | text            | NO   |     | NULL    |                |
| grp_tres_mins     | text            | NO   |     | NULL    |                |
| grp_tres_run_mins | text            | NO   |     | NULL    |                |
| grp_wall          | int             | YES  |     | NULL    |                |
| priority          | int unsigned    | YES  |     | NULL    |                |
| def_qos_id        | int             | YES  |     | NULL    |                |
| qos               | blob            | NO   |     | NULL    |                |
| delta_qos         | blob            | NO   |     | NULL    |                |
+-------------------+-----------------+------+-----+---------+----------------+
32 rows in set (0.01 sec)


All the best,
Chris
Comment 4 Carlos Tripiana Montes 2023-01-18 10:17:11 MST
After having check it, I am mostly sure it's a matter of tuning 3 config values, if possible. For "using temporary":

tmp_table_size
max_heap_table_size 

For "using filesort":

sort_buffer_size

I don't know the exact values in your mysql/mariadb but what for sure is an issue here is the fact that you have an uncommon large amount of rows in the assoc table so this leads to some slow queries if it is needed to perform internal steps or sorts by the engine.

I'd suggest to test if, increasing the values a bit on those params, helps.

Sometimes, we have room to improve the query and avoid some overload, or add an extra index to the table. But here, it doesn't seem an easy tasks. Not sure if even possible. lft is already an index, btw.

Let us know you it goes.

Cheers,
Carlos.

[1] https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-tables.html
[2] https://dev.mysql.com/doc/refman/8.0/en/order-by-optimization.html#order-by-filesort
Comment 5 Carlos Tripiana Montes 2023-02-14 06:25:09 MST
Hey Chris,

Do you need anything else from our side here? Otherwise, you might consider to
close the issue as resolved/infovigen.

Thanks,
Carlos.
Comment 6 Carlos Tripiana Montes 2023-02-27 08:07:00 MST
Closing now.