Hello, We have observed some issues and (seeming) inconsistencies with `sreport` and various date/time ranges. The general issue we've seen, and have had reported to us by a couple staff members, is receiving a message like this one: ``` sreport: error: slurmdbd: Getting response to message type: DBD_GET_ASSOCS sreport: error: slurmdbd: DBD_GET_ASSOCS failure: No error slurmdb_report_cluster_account_by_user: Problem with get query. ``` I have observed this when adding a time parameter to a large query (e.g. 2020-01-06 vs 2020-01-06T12:00:00). A colleague has noticed when he adds a date into the future (i.e. a date that has not occurred yet). At first I thought it was due to large queries (i.e. a start time being over a year ago) but that seems to not be the case. For example, when I run the following it completes and does so pretty quickly: ``` [drhey@glctld ~]$ /usr/bin/time -v /opt/slurm/bin/sreport -nP -T billing cluster AccountUtilizationByUser Accounts=engin1 Start=2020-01-31 End=2021-02-01 format=account,login,used ... Command being timed: "/opt/slurm/bin/sreport -nP -T billing cluster AccountUtilizationByUser Accounts=engin1 Start=2020-01-31 End=2021-02-01 format=account,login,used" User time (seconds): 0.01 System time (seconds): 0.00 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:28.79 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 13784 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 4066 Voluntary context switches: 433 Involuntary context switches: 398 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ``` But when I add a time dimension to the same query it takes more time and craps out: ``` [drhey@glctld ~]$ /usr/bin/time -v /opt/slurm/bin/sreport -nP -T billing cluster AccountUtilizationByUser Accounts=engin1 Start=2020-01-31T12:00:00 End=2021-02-01 format=account,login,used sreport: error: slurmdbd: Getting response to message type: DBD_GET_ASSOCS sreport: error: slurmdbd: DBD_GET_ASSOCS failure: No error slurmdb_report_cluster_account_by_user: Problem with get query. Command exited with non-zero status 1 Command being timed: "/opt/slurm/bin/sreport -nP -T billing cluster AccountUtilizationByUser Accounts=engin1 Start=2020-01-31T12:00:00 End=2021-02-01 format=account,login,used" User time (seconds): 0.08 System time (seconds): 0.06 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 15:00.94 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 89640 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 22572 Voluntary context switches: 3693 Involuntary context switches: 3 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 1 ``` I realize there might be something going on underneath the hood with the MariaDB query mechanics that I am not well versed in. The reason this come up is that someone was seeing this same error while attempting to run a query from the start of this month to the first of the next month. Any insight would be greatly appreciated, as always! Regards, David
Hi. I'll check into this for you.
Could you provide the slurmdbd.log for the periods when you are running both versions of the query (with and without time)?
Interesting. On my own test system with a very limited number of accounting records over the past year shows a significant (~4x) difference in real time with similar factors for user and sys: >$ time bash -c "sreport -nP -T billing cluster AccountUtilizationByUser Start=2020-01-31 End=2021-02-01 format=account,login,used >/dev/null" > >real 0m0.043s >user 0m0.011s >sys 0m0.004s >$ time bash -c "sreport -nP -T billing cluster AccountUtilizationByUser Start=2020-01-31T12:00:00 End=202 >1-02-01 format=account,login,used >/dev/null" > >real 0m0.169s >user 0m0.033s >sys 0m0.025s The amount of time I'm looking at is tiny compared to yours but this observation will help me drill down on my own system with some further tests.
Created attachment 18044 [details] slurmdbd.conf
Created attachment 18045 [details] slurm.conf
Created attachment 18046 [details] slurmdbd.log-20210220.gz This should contain the data from when I was running the queries. We have debug logs on and back them up. So, I can get you files from as far back as December - though the issue in this ticket was noticed/reported as of January 2021.
Chad, I forgot to get you the conf files for the cluster in question. So, those have been uploaded in addition to the requested logs. Thanks! David
Thanks for supplying the conf files. I'd like to correlate the dbd log with your queries. Would you do the following? Enable DB_ASSOC in the debugflags for slurmdbd Run this for each of your 2 queries (w/ and w/o the time spec on the Start=) >date +"%Y-%m-%dT%T.%3N"; sreport ...; date +"%Y-%m-%dT%T.%3N" Share dbd log covering those Supply the output from "sdiag -a"
Hi, Chad, My apologies for the delay. Would you like the sdiag -a output from while each command was running, before, or after? Or, I could put it into a while loop and append to a file for data-over-time? David
(In reply to ARCTS Admins from comment #10) > My apologies for the delay. Would you like the sdiag -a output from while > each command was running, before, or after? Or, I could put it into a while > loop and append to a file for data-over-time? No problem--run it before and after each sreport. Thanks.
Created attachment 18250 [details] results of sreport with DB_ASSOC enabled
Chad, Attached is the file containing the following: * a directory containing the sdiag results for the sreport command with a timestamp: ``` [root@gls-build with_timestamp]# date +"%Y-%m-%dT%T.%3N"; /opt/slurm/bin/sreport -nP -T billing cluster AccountUtilizationByUser Accounts=engin1 Start=2020-01-31T12:00:00 End=2021-02-01 format=account,login,used; date +"%Y-%m-%dT%T.%3N" ``` * a directory containing the sdiag results for the sreport command without a timestamp: ``` [root@gls-build without_timestamp]# date +"%Y-%m-%dT%T.%3N"; /opt/slurm/bin/sreport -nP -T billing cluster AccountUtilizationByUser Accounts=engin1 Start=2020-01-31 End=2021-02-01 format=account,login,used; date +"%Y-%m-%dT%T.%3N" ``` * the slurmdbd.log from the entire exercise David
Chad, I realize times might help! ``` [root@gls-build without_timestamp]# date +"%Y-%m-%dT%T.%3N"; /opt/slurm/bin/sreport -nP -T billing cluster AccountUtilizationByUser Accounts=engin1 Start=2020-01-31 End=2021-02-01 format=account,login,used; date +"%Y-%m-%dT%T.%3N" 2021-03-04T11:00:48.946 ... 2021-03-04T11:02:23.963 ``` ``` [root@gls-build with_timestamp]# date +"%Y-%m-%dT%T.%3N"; /opt/slurm/bin/sreport -nP -T billing cluster AccountUtilizationByUser Accounts=engin1 Start=2020-01-31T12:00:00 End=2021-02-01 format=account,login,used; date +"%Y-%m-%dT%T.%3N" 2021-03-04T11:03:19.479 ... sreport: error: slurmdbd: Getting response to message type: DBD_GET_ASSOCS sreport: error: slurmdbd: DBD_GET_ASSOCS failure: No error slurmdb_report_cluster_account_by_user: Problem with get query. 2021-03-04T11:18:20.315 ``` David
Thanks for sending this info--I want to study it a bit and then will get back to you.
I'm still looking at this one and trying a few experiments locally. Can your supply the output from "sacctmgr show stats"? I'm just curious to see some of the values.
Chad, Here is the output: ``` [root@gls-build ~]# sacctmgr show stats ******************************************************************* sacctmgr show stats output at Wed Mar 17 08:37:58 2021 (1615984678) Data since Thu Mar 04 10:52:49 2021 (1614873169) All statistics are in microseconds ******************************************************************* Internal DBD rollup last ran Wed Mar 17 08:00:01 2021 (1615982401) Last cycle: 51525 Max cycle: 5482360 Total time: 88506833 Total cycles: 310 Mean cycle: 285505 Cluster 'greatlakes' rollup statistics Hour last ran Wed Mar 17 08:00:00 2021 (1615982400) Last cycle: 32717 Max cycle: 130383 Total time: 14287940 Total cycles: 309 Mean cycle: 46239 Day last ran Wed Mar 17 00:00:00 2021 (1615953600) Last cycle: 2876993 Max cycle: 2966009 Total time: 36590567 Total cycles: 13 Mean cycle: 2814659 Cluster 'lighthouse' rollup statistics Hour last ran Wed Mar 17 08:00:00 2021 (1615982400) Last cycle: 24306 Max cycle: 113612 Total time: 10602478 Total cycles: 309 Mean cycle: 34312 Day last ran Wed Mar 17 00:00:00 2021 (1615953600) Last cycle: 641482 Max cycle: 836326 Total time: 9428254 Total cycles: 13 Mean cycle: 725250 Remote Procedure Call statistics by message type DBD_FINI ( 1401) count:75660 ave_time:101 total_time:7665642 SLURM_PERSIST_INIT ( 6500) count:38226 ave_time:1538 total_time:58822701 DBD_GET_TRES ( 1486) count:37153 ave_time:179 total_time:6675495 DBD_GET_QOS ( 1448) count:37149 ave_time:287 total_time:10667647 DBD_GET_CLUSTERS ( 1412) count:37037 ave_time:1131 total_time:41900969 DBD_CLUSTER_TRES ( 1407) count:3706 ave_time:578 total_time:2143549 DBD_GET_JOBS_COND ( 1444) count:773 ave_time:1530 total_time:1182776 DBD_GET_ASSOCS ( 1410) count:415 ave_time:6258882 total_time:2597436179 DBD_STEP_START ( 1442) count:23 ave_time:15199 total_time:349593 DBD_STEP_COMPLETE ( 1441) count:23 ave_time:6549 total_time:150647 DBD_SEND_MULT_JOB_START ( 1472) count:6 ave_time:18900 total_time:113404 DBD_JOB_COMPLETE ( 1424) count:6 ave_time:17026 total_time:102161 DBD_SEND_MULT_MSG ( 1474) count:6 ave_time:13361 total_time:80171 DBD_NODE_STATE ( 1432) count:6 ave_time:5818 total_time:34911 DBD_REGISTER_CTLD ( 1434) count:2 ave_time:45980 total_time:91961 DBD_GET_USERS ( 1415) count:1 ave_time:315106 total_time:315106 DBD_GET_WCKEYS ( 1453) count:1 ave_time:36418 total_time:36418 DBD_GET_RES ( 1478) count:1 ave_time:2012 total_time:2012 DBD_GET_FEDERATIONS ( 1494) count:1 ave_time:311 total_time:311 DBD_GET_STATS ( 1489) count:1 ave_time:26 total_time:26 Remote Procedure Call statistics by user sensu ( 297) count:222162 ave_time:552 total_time:122677488 slurm ( 495) count:3787 ave_time:1182 total_time:4476618 encore-web ( 1002866) count:2319 ave_time:1051 total_time:2438854 arctsrmpbot ( 288) count:1882 ave_time:685 total_time:1289849 root ( 0) count:28 ave_time:92745532 total_time:2596874905 vnegrea ( 106485) count:18 ave_time:775 total_time:13965 ``` I want to note that this issue was originally reported in our prod environment, but that we also have a stage environment where the data is refreshed from prod regularly (notably, the slurmdbd), and that's where we've been testing this (specifically, the aforementioned changes to slurmdbd.conf debug flags). So, if you want the output of `sacctmgr show stats` from production, just let me know. David
(In reply to ARCTS Admins from comment #14) > ``` > [root@gls-build without_timestamp]# date +"%Y-%m-%dT%T.%3N"; > /opt/slurm/bin/sreport -nP -T billing cluster AccountUtilizationByUser > Accounts=engin1 Start=2020-01-31 End=2021-02-01 format=account,login,used; > date +"%Y-%m-%dT%T.%3N" > 2021-03-04T11:00:48.946 > ... > 2021-03-04T11:02:23.963 > ``` > > ``` > [root@gls-build with_timestamp]# date +"%Y-%m-%dT%T.%3N"; > /opt/slurm/bin/sreport -nP -T billing cluster AccountUtilizationByUser > Accounts=engin1 Start=2020-01-31T12:00:00 End=2021-02-01 > format=account,login,used; date +"%Y-%m-%dT%T.%3N" > 2021-03-04T11:03:19.479 > ... > sreport: error: slurmdbd: Getting response to message type: DBD_GET_ASSOCS > sreport: error: slurmdbd: DBD_GET_ASSOCS failure: No error > slurmdb_report_cluster_account_by_user: Problem with get query. > 2021-03-04T11:18:20.315 > ``` Thanks for these--they helped a lot in correlating slurmdbd log lines. Let's call the first case NTS (no time stamp) and the second TS (time stamp; also the one that ran long). For NTS here are the initial connection and last query log lines: --- [2021-03-04T11:00:48.952] debug: REQUEST_PERSIST_INIT: CLUSTER:greatlakes VERSION:8960 UID:0 IP:141.211.27.116 CONN:10 ... [2021-03-04T11:00:49.244] 10(as_mysql_usage.c:482) query select t3.id_assoc, t1.id_tres, t1.time_start, t1.alloc_secs from "greatlakes_assoc_usage_day_table" as t1, "greatlakes_assoc_table" as t2, "greatlakes_assoc_table" as t3 where (t1.time_start < 1612155600 && t1.time_start >= 1580446800) && ... --- For TS here are the initial connection and last query log lines: --- [2021-03-04T11:03:19.485] debug: REQUEST_PERSIST_INIT: CLUSTER:greatlakes VERSION:8960 UID:0 IP:141.211.27.116 CONN:10 ... [2021-03-04T11:03:20.311] 10(as_mysql_usage.c:482) query select t3.id_assoc, t1.id_tres, t1.time_start, t1.alloc_secs from "greatlakes_assoc_usage_hour_table" as t1, "greatlakes_assoc_table" as t2, "greatlakes_assoc_table" as t3 where (t1.time_start < 1612155600 && t1.time_start >= 1580490000) && ... --- Notice that other than the "time_start >=" value, the big difference in the queries is that NTS uses "greatlakes_assoc_usage_day_table" while TS uses "greatlakes_assoc_usage_hour_table". Basically, NTS uses the day table while TS uses the hour table. This is consistent with the code from set_usage_information() in src/plugins/accounting_storage/common/common_as.c--this function picks the table to use: 474 /* check to see if we are off day boundaries or on month 475 * boundaries other wise use the day table. 476 */ 477 //info("%d %d %d", start_tm.tm_hour, end_tm.tm_hour, end-start); 478 if (start_tm.tm_hour || end_tm.tm_hour || (end-start < 86400) 479 || (end > my_time)) { 480 switch (type) { 481 case DBD_GET_ASSOC_USAGE: 482 my_usage_table = assoc_hour_table; 483 break; set_usage_information() is called by get_usage_for_list() which then calls_get_object_usage(): 690 if (set_usage_information(&my_usage_table, type, &start, &end) 691 != SLURM_SUCCESS) { 692 xfree(id_str); 693 return SLURM_ERROR; 694 } 695 696 if (_get_object_usage(mysql_conn, type, my_usage_table, cluster_name, 697 id_str, start, end, &usage_list) 698 != SLURM_SUCCESS) { 699 xfree(id_str); 700 return SLURM_ERROR; 701 } Then in _get_object_usage() you can see where the DB_USAGE log line is issued: 481 if (debug_flags & DEBUG_FLAG_DB_USAGE) 482 DB_DEBUG(mysql_conn->conn, "query\n%s", query); That's line 482 logged in the slurmdbd.log. Anyway, maybe more than you wanted to know but just wanted you to see the logic going on. But now we know that different tables are being used for the two queries: NTS uses the day table -> greatlakes_assoc_usage_day_table TS uses the hour table -> greatlakes_assoc_usage_hour_table Next we need to drill down on the size of those tables and maybe see if MariaDB logs something useful related to those queries. Perhaps the hour table is very large relative to the day table and that's why TS is taking so long...maybe the hour table needs to be cleaned out. Could you get row counts on those two tables using something like this? $ mysql mysql>use slurm_acct_db; mysql>select count(*) from greatlakes_assoc_usage_day_table; mysql>select count(*) from greatlakes_assoc_usage_hour_table; mysql>quit I'll look into table clean out settings in the mean time. As for MariaDB, it may log something in /var/log somewhere but will have to look at what's logged by default. On my system, I use mysql and it logs to /var/log/mysql/error.log (but not much in it).
Chad, Thanks for that. It was incredibly helpful! Here is the data you requested: ``` MariaDB [slurm_acct_db]> select count(*) from greatlakes_assoc_usage_day_table; +----------+ | count(*) | +----------+ | 368331 | +----------+ 1 row in set (0.06 sec) MariaDB [slurm_acct_db]> select count(*) from greatlakes_assoc_usage_hour_table; +----------+ | count(*) | +----------+ | 4797368 | +----------+ 1 row in set (0.70 sec) ``` AFAIK, MariaDB was meant to keep parity with MySQL (or perhaps the same spirit MySQL initially had). IIRC, MariaDB came about because a subset of the community wasn't pleased with Oracle acquiring/influencing MySQL. I've not seen much difference between the two, but I'm also not a DBA. David
(In reply to ARCTS Admins from comment #19) > Thanks for that. It was incredibly helpful! > > Here is the data you requested: ... I'm glad it was--I think we're getting close to a resolution. The hour table has about 13 times more entries than the day table so it makes sense that it's taking more time to process and is why you're seeing the timeout on sreport when using a timestamp. I would suggest looking over slurmdbd.conf(1) and the Purge*After (at least PurgeUsageAfter) and maybe the companion Archive* parameters to see if you can trim things down a bit and get your sreport return times under control by shrinking your tables. You might also query the hour table to see how old your oldest entry is if you are curious: >select from_unixtime(creation_time) from greatlakes_assoc_usage_hour_table order by creation_time limit 1; By default, all Slurm records are kept so that's likely the problem here. Let me know if you have more questions.
Have you had a chance to try trimming back the database entries? Will plan on closing this in a day or so if you are in good shape.
Chad, I am doing some testing on this today and will report back soon (hopefully by days end). David
Chad, I am doing some testing with a copy of our DB on a test system. OOC, what actually "triggers" the Purge* and Archive* behaviors? I thought it might be the rollup process based on context due to what `man slurmdbd.conf` says about the `ArchiveScript`: ``` This script can be executed every time a rollup happens (every hour, day and month), depending on the Purge*After options. ``` I figured, if the script is executed on every rollup, perhaps the other parameters are too? Thanks, DAvid
(In reply to ARCTS Admins from comment #23) >I am doing some testing with a copy of our DB on a test system. OOC, what > actually "triggers" the Purge* and Archive* behaviors? I thought it might be > the rollup process based on context due to what `man slurmdbd.conf` says > about the `ArchiveScript`: > > ``` > This script can be executed every time a rollup happens (every hour, day and > month), depending on the Purge*After options. > ``` > > I figured, if the script is executed on every rollup, perhaps the other > parameters are too? I was wondering about this too. The page for slurmdbd.conf says this about PurgeUsageAfter: -- The purge takes place at the start of the each purge interval. For example, if the purge time is 2 months, the purge would happen at the beginning of each month. If not set (default), then usage records are never purged. --
Chad, I've observed that paring down the DB through Archive/Purge means works as described. However, given my aggressive test (setting everything to 1 hour) most of the stuff I needed to test specifically for this ticket (like the initial sreport example commands given; with and without timestamp) return nothing - which I realize is due to my aggressive 1hour parameter. If there was a way I could set something more realistic, like 12months, and then trigger that behavior, that could help me see if the initial commands (the ones I gave when opening this ticket) could actually return something. Is there such a way? Another question I have. How does one load the files that get created with Purge/Archive options? The man page for sacctmgr seems sparse, and when I attempt to load a file into a DB using the process described therein I get the following: ``` sacctmgr: error: slurmdbd: Error with request. Problem loading archive file: Permission denied Note: For archive load, the file must be accessible on the slurmdbd host. ``` I have the file on the system where I am executing the sacctmgr archive load command, and have changed permissions on it to no avail. And the slurmdbd.log shows: [2021-04-06T14:18:48.254] Could not open archive file `/root/greatlakes_usage_hour_table_archive_2021-04-05T14:00:00_2021-04-05T17:59:59`: Permission denied What permission is being referred to here? David
(In reply to ARCTS Admins from comment #25) Will look into those questions.
(In reply to ARCTS Admins from comment #25) > I've observed that paring down the DB through Archive/Purge means works as > described. However, given my aggressive test (setting everything to 1 hour) > most of the stuff I needed to test specifically for this ticket (like the > initial sreport example commands given; with and without timestamp) return > nothing - which I realize is due to my aggressive 1hour parameter. > > If there was a way I could set something more realistic, like 12months, and > then trigger that behavior, that could help me see if the initial commands > (the ones I gave when opening this ticket) could actually return something. > Is there such a way? Can you just set PurgeUsageAfter to 12 (unit for this parameter is months)? But maybe I am missing something in your question. I'm also still pursuing your permission question.
(In reply to ARCTS Admins from comment #25) > ``` > sacctmgr: error: slurmdbd: Error with request. > Problem loading archive file: Permission denied > Note: For archive load, the file must be accessible on the slurmdbd host. > ``` > > I have the file on the system where I am executing the sacctmgr archive load > command, and have changed permissions on it to no avail. And the > slurmdbd.log shows: > > [2021-04-06T14:18:48.254] Could not open archive file > `/root/greatlakes_usage_hour_table_archive_2021-04-05T14:00:00_2021-04-05T17: > 59:59`: Permission denied > > > What permission is being referred to here? Have you tried putting the file (and running sacctmgr) on the same host that that slurmdbd runs on? This may be the problem.
Chad, So, I did try doing that (loading the dump files onto a machine where the slurmdbd was running). But the caveat was that these files were dumped from another cluster of the same name. What I am attempting to achieve is a way to keep our production slurmdbd nimble and responsive to queries, but to dump (and then import) the pared down stuff (i.e. those files) to another DB where admins in other departments who assist users could do deep dive queries if they needed to. I hope that clarifies. Perhaps I've gone down a wrong path with that design? David
(In reply to ARCTS Admins from comment #29) > So, I did try doing that (loading the dump files onto a machine where the > slurmdbd was running). But the caveat was that these files were dumped from > another cluster of the same name. > > What I am attempting to achieve is a way to keep our production slurmdbd > nimble and responsive to queries, but to dump (and then import) the pared > down stuff (i.e. those files) to another DB where admins in other > departments who assist users could do deep dive queries if they needed to. > > I hope that clarifies. Perhaps I've gone down a wrong path with that design? I reviewed this internally with a colleague and he had the following suggestions/advise: First: sreport/sacct should be able to handle large queries--purging is more important for saving space than reducing query time. Setting up an alternate system with replicated data is probably going to lead to problems. Second: Try to get to the bottom of the problem with the original error message: >[drhey@glctld ~]$ /usr/bin/time -v /opt/slurm/bin/sreport -nP -T billing cluster AccountUtilizationByUser Accounts=engin1 >Start=2020-01-31T12:00:00 End=2021-02-01 format=account,login,used >sreport: error: slurmdbd: Getting response to message type: DBD_GET_ASSOCS >sreport: error: slurmdbd: DBD_GET_ASSOCS failure: No error I'm not sure how best to advise you on the alternate system you're trying to replicate but I can help pursue the error message some more with you. I looked back at the logs you provided in comment 7 and comment 12 but don't see anything useful there to go on so we'd need to try to reproduce again and drill down with increased logging.
(In reply to Chad Vizino from comment #32) > First: sreport/sacct should be able to handle large queries--purging is more > important for saving space than reducing query time. Setting up an alternate > system with replicated data is probably going to lead to problems. Agreed. The only caveat here is that we bill for jobs, and we have colleagues who are deeply interested in past data. The same data that is probably worthy of a purge. Hence why I wanted to try and find a way to be able to purge but also have a place they could query (though, I realize that the database with purged data most likely would be plagued with the same issues we're seeing, or just slower queries). > Second: Try to get to the bottom of the problem with the original error > message: > > >[drhey@glctld ~]$ /usr/bin/time -v /opt/slurm/bin/sreport -nP -T billing cluster AccountUtilizationByUser Accounts=engin1 >Start=2020-01-31T12:00:00 End=2021-02-01 format=account,login,used > >sreport: error: slurmdbd: Getting response to message type: DBD_GET_ASSOCS > >sreport: error: slurmdbd: DBD_GET_ASSOCS failure: No error > > I'm not sure how best to advise you on the alternate system you're trying to > replicate but I can help pursue the error message some more with you. I > looked back at the logs you provided in comment 7 and comment 12 but don't > see anything useful there to go on so we'd need to try to reproduce again > and drill down with increased logging. Thanks! We are in the middle of pre-maintenance operations here, so I cannot guarantee the ability to be able to implement increased logging. Regardless, what would you recommend implementing so that we could capture more useful data for you? David
(In reply to ARCTS Admins from comment #33) > > > Second: Try to get to the bottom of the problem with the original error > > message: > > > > >[drhey@glctld ~]$ /usr/bin/time -v /opt/slurm/bin/sreport -nP -T billing cluster AccountUtilizationByUser Accounts=engin1 >Start=2020-01-31T12:00:00 End=2021-02-01 format=account,login,used > > >sreport: error: slurmdbd: Getting response to message type: DBD_GET_ASSOCS > > >sreport: error: slurmdbd: DBD_GET_ASSOCS failure: No error > > > > I'm not sure how best to advise you on the alternate system you're trying to > > replicate but I can help pursue the error message some more with you. I > > looked back at the logs you provided in comment 7 and comment 12 but don't > > see anything useful there to go on so we'd need to try to reproduce again > > and drill down with increased logging. > > Thanks! We are in the middle of pre-maintenance operations here, so I cannot > guarantee the ability to be able to implement increased logging. Regardless, > what would you recommend implementing so that we could capture more useful > data for you? I think the key is being able to reproduce it. Can you still do that? For debug flags looks like these can be set (network, protocol and agent) on DebugFlags in slurmdbd.conf: >dbd_conn.c: log_flag(NET, "We are shutdown, not sending DB_FINI to %s:%u", >dbd_conn.c: log_flag(NET, "unable to send DB_FINI msg to %s:%u", >dbd_conn.c: log_flag(NET, "sent DB_FINI msg to %s:%u rc(%d):%s", >dbd_conn.c: log_flag(PROTOCOL, "msg_type:%s protocol_version:%hu return_code:%d response_msg_type:%s", >dbd_conn.c: log_flag(PROTOCOL, "msg_type:%s protocol_version:%hu return_code:%d", >slurmdbd_agent.c: log_flag(PROTOCOL, "msg_type:DBD_ID_RC return_code:%s JobId=%u db_index=%"PRIu64, >slurmdbd_agent.c: log_flag(PROTOCOL, "msg_type:PERSIST_RC return_code:%s ret_info:%hu flags=%#x comment:%s", >slurmdbd_agent.c: log_flag(AGENT, "slurmdbd agent_count=%d with msg_type=%s", For levels it looks like debug is really all that we can get and you already have that set (debug3 and debug4 don't look useful): >accounting_storage_slurmdbd.c: debug4("%s loaded", plugin_name); >accounting_storage_slurmdbd.c: debug("Waiting for db_inx thread to finish."); >accounting_storage_slurmdbd.c: debug2("Sending tres '%s' for cluster", tres_str_in); >dbd_conn.c: debug("Sent PersistInit msg"); >dbd_conn.c: debug("Attempt to re-open slurmdbd socket"); >dbd_conn.c: debug("Issue with call " >slurmdbd_agent.c: debug("PERSIST_RC is %d from " >slurmdbd_agent.c: debug("PERSIST_RC is %d from " >slurmdbd_agent.c: debug4("There is no state save file to " >slurmdbd_agent.c: debug3("Version string in dbd_state header is %s", ver_str); >slurmdbd_agent.c: (slurm_conf.debug_flags & DEBUG_FLAG_AGENT)) >slurmdbd_agent.c: if (slurm_conf.debug_flags & DEBUG_FLAG_AGENT) { So maybe we can start with these DebugFlags and then see what we get after you can test things. Can you run the sreport (with the time of execution listed) to get it to time out and then supply the slurmdbd.log (like in comment 13)? The other thing to look at would possibly be the mariadb log and correlate entries (not sure what to expect there if you even have anything logged). If you need to wait to get past your maintenance period, that's fine.
(In reply to Chad Vizino from comment #34) Hi, Chad, I am coming back around to tickets we have open with SchedMD as they're seeming to be more and more inter-related. > I think the key is being able to reproduce it. Can you still do that? Yes, we can still do that > For debug flags looks like these can be set (network, protocol and agent) on > DebugFlags in slurmdbd.conf: > > >dbd_conn.c: log_flag(NET, "We are shutdown, not sending DB_FINI to %s:%u", > >dbd_conn.c: log_flag(NET, "unable to send DB_FINI msg to %s:%u", > >dbd_conn.c: log_flag(NET, "sent DB_FINI msg to %s:%u rc(%d):%s", > >dbd_conn.c: log_flag(PROTOCOL, "msg_type:%s protocol_version:%hu return_code:%d response_msg_type:%s", > >dbd_conn.c: log_flag(PROTOCOL, "msg_type:%s protocol_version:%hu return_code:%d", > >slurmdbd_agent.c: log_flag(PROTOCOL, "msg_type:DBD_ID_RC return_code:%s JobId=%u db_index=%"PRIu64, > >slurmdbd_agent.c: log_flag(PROTOCOL, "msg_type:PERSIST_RC return_code:%s ret_info:%hu flags=%#x comment:%s", > >slurmdbd_agent.c: log_flag(AGENT, "slurmdbd agent_count=%d with msg_type=%s", > > For levels it looks like debug is really all that we can get and you already > have that set (debug3 and debug4 don't look useful): > > >accounting_storage_slurmdbd.c: debug4("%s loaded", plugin_name); > >accounting_storage_slurmdbd.c: debug("Waiting for db_inx thread to finish."); > >accounting_storage_slurmdbd.c: debug2("Sending tres '%s' for cluster", tres_str_in); > >dbd_conn.c: debug("Sent PersistInit msg"); > >dbd_conn.c: debug("Attempt to re-open slurmdbd socket"); > >dbd_conn.c: debug("Issue with call " > >slurmdbd_agent.c: debug("PERSIST_RC is %d from " > >slurmdbd_agent.c: debug("PERSIST_RC is %d from " > >slurmdbd_agent.c: debug4("There is no state save file to " > >slurmdbd_agent.c: debug3("Version string in dbd_state header is %s", ver_str); > >slurmdbd_agent.c: (slurm_conf.debug_flags & DEBUG_FLAG_AGENT)) > >slurmdbd_agent.c: if (slurm_conf.debug_flags & DEBUG_FLAG_AGENT) { > > So maybe we can start with these DebugFlags and then see what we get after > you can test things. Can you run the sreport (with the time of execution > listed) to get it to time out and then supply the slurmdbd.log (like in > comment 13)? The other thing to look at would possibly be the mariadb log > and correlate entries (not sure what to expect there if you even have > anything logged). If you need to wait to get past your maintenance period, > that's fine. These debug flags (Network, Protocol, Agent) appear in `man slurm.conf` but not specifically for `man slurmdbd.conf`. Will I be setting them in `slurm.conf` (or on-the-fly with `scontrol setdebugflag +<flagName>), or did you have something special for slurmdbd in mind? David
(In reply to ARCTS Admins from comment #35) > > So maybe we can start with these DebugFlags and then see what we get after > > you can test things. Can you run the sreport (with the time of execution > > listed) to get it to time out and then supply the slurmdbd.log (like in > > comment 13)? The other thing to look at would possibly be the mariadb log > > and correlate entries (not sure what to expect there if you even have > > anything logged). If you need to wait to get past your maintenance period, > > that's fine. > > These debug flags (Network, Protocol, Agent) appear in `man slurm.conf` but > not specifically for `man slurmdbd.conf`. Will I be setting them in > `slurm.conf` (or on-the-fly with `scontrol setdebugflag +<flagName>), or did > you have something special for slurmdbd in mind? Good question. I do have a few of those flags set in my slurmdbd.conf file but after looking at my own logs and thinking about it some more I don't think these will really give any useful information. I wonder if you could try timing a direct MariaDB query from your db node via the command line to see how long it takes. Something like this: >time mysql -D slurm_acct_db -e 'explain analyze <sql-query>' So go back to one of your long sreport executions that times out with the error, look up the SQL for it from your slurmdbd.log and substitute <sql-query> above with it (probably will be long!). With that info we may be able to see how the total time compares to the TCP and MessageTimeout values you have set in your slurmdbd.conf (may need tuning) as a start.
(In reply to Chad Vizino from comment #36) > Good question. I do have a few of those flags set in my slurmdbd.conf file > but after looking at my own logs and thinking about it some more I don't > think these will really give any useful information. Cool, just wanted to check! > I wonder if you could try timing a direct MariaDB query from your db node > via the command line to see how long it takes. Something like this: > > >time mysql -D slurm_acct_db -e 'explain analyze <sql-query>' > > So go back to one of your long sreport executions that times out with the > error, look up the SQL for it from your slurmdbd.log and substitute > <sql-query> above with it (probably will be long!). > > With that info we may be able to see how the total time compares to the TCP > and MessageTimeout values you have set in your slurmdbd.conf (may need > tuning) as a start. I did attempt to do this, but found we are at a mariadb version (5.5) that doesn't seem to support ANALYZE. We DO have EXPLAIN, though: ``` MariaDB [slurm_acct_db]> EXPLAIN select t3.id_assoc, t1.id_tres, t1.time_start, t1.alloc_secs from greatlakes_assoc_usage_hour_table as t1, greatlakes_assoc_table as t2, greatlakes_assoc_table as t3 where (t1.time_start < 1625112000 && t1.time_start >= 1578330000) && t1.id=t2.id_assoc && (t3.id_assoc in (8737,23167,23135,23132,23113,23112,23110,23086,23085,23084,23083,23082,23059,22910,22882,22870,22869,22850,22849,22847,22846,22807,22796,22794,22793,22791,22790,22789,22788,22771,22755,22744,22713,22694,22690,22684,22672,22670,22668,22666,22659,22658,22629,22628,22627,22626,22625,22620,22607,22591,22573,22572,22570,22559,22555,22554,22553,22552,22551,22545,22539,22538,22534,22524,22505,22504,22499,22498,22497,22496,22483,22476,22459,22458,22456,22455,22454,22453,22452,22451,22450,22449,22442,22441,22438,22437,22433,22432,22431,22428,22427,22426,22425,22424,22423,22422,22413,22410,22409,22408,22405,22403,22402,22394,22393,22392,22391,22390,22383,22382,22381,22370,22366,22343,22341,22340,22330,22329,22324,22323,22322,22283,22282,22281,22273,22272,22164,22163,22162,22161,22153,22152,22151,22150,22149,22148,22139,22131,22130,21910,21901,21892,21890,21880,21870,21841,21834,21818,21786,21752,21751,21750,21709,21703,21698,21681,21680,21675,21674,21614,21606,21605,21604,21601,21570,21564,21561,21560,21551,21539,21505,21499,21498,21497,21496,21495,21494,21493,21491,21490,21467,21437,21432,21427,21423,21402,21388,21383,21365,21362,21302,21299,21297,21295,21294,21293,21288,21258,21230,21229,21228,21227,21225,21223,21220,21191,21190,21189,21188,21187,21186,21185,21174,21172,21171,21158,21157,21156,21155,21149,21148,21141,21137,21119,21118,21098,21097,21096,21091,21090,21073,21054,21053,21033,21029,21023,21019,21012,20978,20975,20930,20929,20928,20741,20701,20682,20664,20642,20562,20549,20548,20520,20359,20352,20338,20290,20278,20249,20245,20222,20221,20220,20217,20165,20159,20144,20130,20129,20128,20127,20126,20125,20085,20081,20072,20066,20065,20063,20062,20057,20056,20055,20054,20053,20047,20037,20014,20013,20012,20011,20010,19998,19997,19969,19921,19901,19890,19888,19885,19870,19869,19838,19837,19833,19819,19818,19811,19797,19796,19795,19794,19792,19788,19773,19771,19766,19765,19751,19743,19742,19715,19713,19705,19704,19703,19702,19698,19697,19696,19692,19691,19686,19685,19684,19676,19674,19669,19666,19659,19657,19651,19650,19649,19644,19643,19642,19637,19636,19635,19634,19633,19632,19619,19618,19617,19616,19615,19614,19613,19612,19608,19607,19600,19599,19363,19362,19361,19175,19174,19171,19163,19159,19158,19157,19156,19154,19153,19152,19151,19150,19149,19113,19112,19062,19061,19060,19059,19058,19057,19056,19055,19003,18996,18995,18988,18987,18986,18985,18948,18935,18932,18903,18898,18891,18890,18881,18880,18847,18846,18845,18840,18839,18812,18786,18784,18778,18774,18767,18763,18762,18761,18748,18747,18746,18745,18742,18724,18704,18693,18690,18689,18682,18681,18657,18647,18619,18618,18602,18601,18600,18588,18573,18546,18541,18538,18537,18533,18532,18516,18515,18514,18508,18507,18506,18503,18498,18488,18484,18483,18477,18469,18465,18441,18440,18439,18438,18429,18428,18412,18411,18345,18343,18340,18339,18335,18332,18261,18255,18254,18246,18220,18219,18218,18217,18214,18213,18212,18206,18205,18201,18197,18195,18182,18181,18180,18179,18178,18177,18173,18161,18156,18125,18095,18092,18090,18089,18088,18087,18086,18082,18079,18078,18077,18075,18074,18073,18066,18062,18061,18060,18059,18058,18057,18056,18055,18054,18053,18052,18051,18050,18049,18048,18047,18046,18015,18014,18012,18011,18010,18009,18008,18004,17989,17978,17977,17968,17958,17954,17951,17941,17940,17939,17933,17932,17930,17926,17841,17788,17787,17780,17777,17776,17772,17765,17755,17754,17741,17722,17721,17711,17710,17709,17708,17707,17706,17702,17662,17641,17637,17542,17537,17536,17363,17101,17100,17077,17068,17065,17019,17017,17016,17007,16997,16963,16962,16961,16952,16944,16928,16927,16905,16836,16835,16834,16814,16780,16730,16729,16712,16711,16710,16705,16676,16672,16629,16628,16612,16599,16590,16586,16583,16581,16536,16518,16517,16516,16509,16508,16507,16506,16505,16493,16461,16460,16459,16458,16449,16443,16441,16440,16439,16438,16437,16436,16435,16434,16433,16432,16431,16430,16429,16428,16427,16426,16425,16424,16423,16422,16421,16420,16419,16418,16417,16355,16345,16324,16311,16310,16290,16277,16236,16221,16154,16150,16104,16098,16097,16094,16093,16081,16079,16072,16068,16045,16043,16039,15996,15965,15924,15923,15921,15920,15913,15910,15901,15900,15899,15850,15835,15830,15815,15807,15804,15794,15793,15791,15781,15748,15730,15723,15708,15691,15673,15672,15671,15669,15657,15634,15610,15609,15606,15566,15565,15564,15563,15562,15561,15560,15559,15558,15557,15556,15537,15462,15443,15442,15441,15436,15435,15387,15384,15383,15382,15379,15349,15348,15319,15318,15314,15310,15272,15267,15266,15250,15243,15235,15225,15224,15135,15134,15129,15128,15127,15126,15125,15116,15113,15095,15074,15072,15067,15065,15064,15063,15062,15026,14903,14902,14901,14900,14895,14894,14893,14707,14667,14652,14648,14640,14639,14629,14628,14627,14626,14625,14624,14623,14622,14562,14520,14505,14498,14497,14476,14475,14456,14448,14447,14446,14445,14444,14443,14442,14441,14440,14438,14437,14436,14435,14434,14378,14267,14265,14264,14263,14262,14261,14260,14259,14258,14257,14256,14221,14207,14128,14127,14018,14011,14007,13973,13926,13919,13189,13161,13106,13096,13095,13094,13093,13092,13091,13090,13089,13088,13087,13086,13085,13084,13083,13082,13081,13080,13079,13078,13077,13076,13075,13074,13073,13072,13071,13070,13069,13068,13067,13066,13065,13064,13063,13062,13061,13060,13059,13058,13057,13056,13055,13054,13053,13052,13051,13050,13049,13048,13047,13046,13045,13044,13043,13042,13041,13040,13039,13038,13037,12963,12712,12206,10473,10422,10421,10412,10411,10410,10409,10408,10407,10406,10405,10404,10403,10402,10401,10400,10399,10398,10397,10396,10395,10394,10393,10392,10391,10390,10389,10388,10387,10386,10385,10384,10383,10382,10381,10380,10379,10378,10377,10376,10375,10374,10373,10372,10371,10370,10369,10368,10367,10366,10365,10364,10363,10362,10361,10360,10359,10358,10357,10356,10355,10354,10353,10352,10351,10350,10349,10348,10347,10346,10345,10344,10343,10342,10341,10340,10339,10338,10337,10336,10335,10334,10333,10332,10331,10330,10329,10328,10327,10326,10325,10324,10323,10322,10321,10320,10319,10318,10317,10316,10315,10314,10313,10312,10311,10310,10309,10308,10307,10306,10305,10304,10303,10302,10301,10300,10299,10298,10297,10296,10295,10294,10293,10292,10291,10290,10289,10288,10287,10286,10285,10284,10283,10282,10281,10280,10279,10278,10277,10276,10275,10274,10273,10272,10271,10270,10269,10268,10267,10266,10265,10264,10263,10262,10261,10260,10259,10258,10257,10256,10255,10254,10253,10252,10251,10250,10249,10248,10247,10246,10245,10244,10243,10242,10241,10240,10239,10238,10237,10236,10235,10234,10233,10232,10231,10230,10229,10228,10227,10226,10225,10224,10223,10222,10221,10220,10219,10218,10217,10216,10215,10214,10213,10212,10211,10210,10209,10208,10207,10206,10205,10204,10203,10202,10201,10200,10199,10198,10197,10196,10195,10194,10193,10192,10191,10190,10189,10188,10187,10186,10185,10184,10183,10182,10181,10180,10179,10178,10177,10176,10175,10174,10173,10172,10171,10170,10169,10168,10167,10166,10165,10164,10163,10162,10161,10160,10159,10158,10157,10156,10155,10154,10153,10152,10151,10150,10149,10148,10147,10146,10145,10144,10143,10142,10141,10140,10139,10138,10137,10136,10135,10134,10133,10132,10131,10130,10129,10128,10127,10126,10125,10124,10123,10122,10121,10120,10119,10118,10117,10116,10115,10114,10113,10112,10111,10110,10109,10108,10107,10106,10105,10104,10103,10102,10101,10100,10099,10098,10097,10096,10095,10094,10093,10092,10091,10090,10089,10088,10087,10086,10085,10084,10083,10082,10081,10080,10079,10078,10077,10076,10075,10074,10073,10072,10071,10070,10069,10068,10067,10066,10065,10064,10063,10062,10061,10060,10059,10058,10057,10056,10055,10054,10053,10052,10051,10050,10049,10048,10047,10046,10045,10044,10043,10042,10041,10040,10039,10038,10037,10036,10035,10034,10033,10032,10031,10030,10029,10028,10027,10026,10025,10024,10023,10022,10021,10020,10019,10018,10017,10016,10015,10014,10013,10012,10011,10010,10009,10008,10007,10006,10005,10004,10003,10002,10001,10000,9999,9998,9997,9996,9995,9994,9993,9992,9991,9990,9989,9988,9987,9986,9985,9984,9983,9982,9981,9980,9979,9978,9977,9976,9975,9974,9973,9972,9971,9970,9969,9968,9967,9966,9965,9964,9963,9962,9961,9960,9959,9958,9957,9956,9955,9954,9953,9952,9951,9950,9949,9948,9947,9946,9945,9944,9943,9942,9941,9940,9939,9938,9937,9936,9935,9934,9933,9932,9931,9930,9929,9928,9927,9926,9925,9924,9923,9922,9921,9920,9919,9918,9917,9916,9915,9914,9913,9912,9911,9910,9909,9908,9907,9906,9905,9904,9903,9902,9901,9900,9899,9898,9897,9896,9895,9894,9893,9892,9891,9890,9889,9888,9887,9886,9885,9884,9883,9882,9881,9880,9879,9878,9877,9876,9875,9874,9873,9872,9871,9870,9869,9868,9867,9866,9865,9864,9863,9862,9861,9860,9859,9858,9857,9856,9855,9854,9853,9852,9851,9850,9849,9848,9847,9846,9845,9844,9843,9842,9841,9840,9839,9838,9837,9836,9835,9834,9833,9832,9831,9830,9829,9828,9827,9826,9825,9824,9823,9822,9821,9820,9819,9818,9817,9816,9815,9814,9813,9812,9811,9810,9809,9808,9807,9806,9805,9804,9803,9802,9801,9800,9799,9798,9797,9796,9795,9794,9793,9792,9791,9790,9789,9788,9787,9786,9785,9784,9783,9782,9781,9780,9779,9778,9777,9776,9775,9774,9773,9772,9771,9770,9769,9768,9767,9766,9765,9764,9763,9762,9761,9760,9759,9758,9757,9756,9755,9754,9753,9752,9751,9750,9749,9748,9747,9746,9745,9744,9743,9742,9741,9740,9739,9738,9737,9736,9735,9734,9733,9732,9731,9730,9729,9728,9727,9726,9725,9724,9723,9722,9721,9720,9719,9718,9717,9716,9715,9714,9713,9712,9711,9710,9709,9708,9707,9706,9705,9704,9703,9702,9701,9700,9699,9698,9697,9696,9695,9694,9693,9692,9691,9690,9689,9688,9687,9686,9685,9684,9683,9682,9681,9680,9679,9678,9677,9676,9675,9674,9673,9672,9671,9670,9669,9668,9667,9666,9665,9664,9663,9662,9661,9660,9659,9658,9657,9656,9655,9654,9653,9652,9651,9650,9649,9648,9647,9646,9645,9644,9643,9642,9641,9640,9639,9638,9637,9636,9635,9634,9633,9632,9631,9630,9629,9628,9627,9626,9625,9624,9623,9622,9621,9620,9619,9618,9617,9616,9615,9614,9613,9612,9611,9610,9609,9608,9607,9606,9605,9604,9603,9602,9601,9600,9599,9598,9597,9596,9595,9594,9593,9592,9591,9590,9589,9588,9587,9586,9585,9584,9583,9582,9581,9580,9579,9578,9577,9576,9575,9574,9573,9572,9571,9570,9569,9568,9567,9566,9565,9564,9563,9562,9561,9560,9559,9558,9557,9556,9555,9554,9553,9552,9551,9550,9549,9548,9547,9546,9545,9544,9543,9542,9541,9540,9539,9538,9537,9536,9535,9534,9533,9532,9531,9530,9529,9528,9527,9526,9525,9524,9523,9522,9521,9520,9519,9518,9517,9516,9515,9514,9513,9512,9511,9510,9509,9508,9507,9506,9505,9504,9503,9502,9501,9500,9499,9498,9497,9496,9495,9494,9493,9492,9491,9490,9489,9488,9487,9486,9485,9484,9483,9482,9481,9480,9479,9478,9477,9476,9475,9474,9473,9472,9471,9470,9469,9468,9467,9466,9465,9464,9463,9462,9461,9460,9459,9458,9457,9456,9455,9454,9453,9452,9451,9450,9449,9448,9447,9446,9445,9444,9443,9442,9441,9440,9439,9438,9437,9436,9435,9434,9433,9432,9431,9430,9429,9428,9427,9426,9425,9424,9423,9422,9421,9420,9419,9418,9417,9416,9415,9414,9413,9412,9411,9410,9409,9408,9407,9406,9405,9404,9403,9402,9401,9400,9399,9398,9397,9396,9395,9394,9393,9392,9391,9390,9389,9388,9387,9386,9385,9384,9383,9382,9381,9380,9379,9378,9377,9376,9375,9374,9373,9372,9371,9370,9369,9368,9367,9366,9365,9364,9363,9362,9361,9360,9359,9358,9357,9356,9355,9354,9353,9352,9351,9350,9349,9348,9347,9346,9345,9344,9343,9342,9341,9340,9339,9338,9337,9336,9335,9334,9333,9332,9331,9330,9329,9328,9327,9326,9325,9324,9323,9322,9321,9320,9319,9318,9317,9316,9315,9314,9313,9312,9311,9310,9309,9308,9307,9306,9305,9304,9303,9302,9301,9300,9299,9298,9297,9296,9295,9294,9293,9292,9291,9290,9289,9288,9287,9286,9285,9284,9283,9282,9281,9280,9279,9278,9277,9276,9275,9274,9273,9272,9271,9270,9269,9268,9267,9266,9265,9264,9263,9262,9261,9260,9259,9258,9257,9256,9255,9254,9253,9252,9251,9250,9249,9248,9247,9246,9245,9244,9243,9242,9241,9240,9239,9238,9237,9236,9235,9234,9233,9232,9231,9230,9229,9228,9227,9226,9225,9224,9223,9222,9221,9220,9219,9218,9217,9216,9215,9214,9213,9212,9211,9210,9209,9208,9207,9206,9205,9204,9203,9202,9201,9200,9199,9198,9197,9196,9195,9194,9193,9192,9191,9190,9189,9188,9187,9186,9185,9184,9183,9182,9181,9180,9179,9178,9177,9176,9175,9174,9173,9172,9171,9170,9169,9168,9167,9166,9165,9164,9163,9162,9161,9160,9159,9158,9157,9156,9155,9154,9153,9152,9151,9150,9149,9148,9147,9146,9145,9144,9143,9142,9141,9140,9139,9138,9137,9136,9135,9134,9133,9132,9131,9130,9129,9128,9127,9126,9125,9124,9123,9122,9121,9120,9119,9118,9117,9116,9115,9114,9113,9112,9111,9110,9109,9108,9107,9106,9105,9104,9103,9102,9101,9100,9099,9098,9097,9096,9095,9094,9093,9092,9091,9090,9089,9088,9087,9086,9085,9084,9083,9082,9081,9080,9079,9078,9077,9076,9075,9074,9073,9072,9071,9070,9069,9068,9067,9066,9065,9064,9063,9062,9061,9060,9059,9058,9057,9056,9055,9054,9053,9052,9051,9050,9049,9048,9047,9046,9045,9044,9043,9042,9041,9040,9039,9038,9037,9036,9035,9034,9033,9032,9031,9030,9029,9028,9027,9026,9025,9024,9023,9022,9021,9020,9019,9018,9017,9016,9015,9014,9013,9012,9011,9010,9009,9008,9007,9006,9005,9004,9003,9002,9001,9000,8999,8998,8997,8996,8995,8994,8993,8992,8991,8990,8989,8988,8987,8986,8985,8984,8983,8982,8981,8980,8979,8978,8977,8976,8975,8974,8973,8972,8971,8970,8969,8968,8967,8966,8965,8964,8963,8962,8961,8960,8959,8958,8957,8956,8955,8954,8953,8952,8951,8950,8949,8948,8947,8946,8945,8944,8943,8942,8941,8940,8939,8938,8937,8936,8935,8934,8933,8932,8931,8930,8929,8928,8927,8926,8925,8924,8923,8922,8921,8920,8919,8918,8917,8916,8915,8914,8913,8912,8911,8910,8909,8908,8907,8906,8905,8904,8903,8902,8901,8900,8899,8898,8897,8896,8895,8894,8893,8892,8891,8890,8889,8888,8887,8886,8885,8884,8883,8882,8881,8880,8879,8878,8877,8876,8875,8874,8873,8872,8871,8870,8869,8868,8867,8866,8865,8864,8863,8862,8861,8860,8859,8858,8857,8856,8855,8854,8853,8852,8851,8850,8849,8848,8847,8846,8845,8844,8843,8842,8841,8840,8839,8838,8837,8836,8835,8834,8833,8832,8831,8830,8829,8828,8827,8826,8825,8824,8823,8822,8821,8820,8819,8818,8817,8816,8815,8814,8813,8812,8811,8810,8809,8808,8807,8806,8805,8804,8803,8802,8801,8800,8799,8798,8797,8796,8795,8794,8793,8792,8791,8790,8789,8788,8787,8786,8785,8784,8783,8782,8781,8780,8779,8778,8777,8776,8775,8774,8773,8772,8771,8770,8769,8768,8767,8766,8765,8764,8763,8762,8761,8760,8759,8758,8757,8756,8755,8754,8753,8752,8751,8750,8749,8748,8747,8746,8745,8744,8743,8742,8741,8740,8739,8738)) && t2.lft between t3.lft and t3.rgt order by t3.id_assoc, time_start; +------+-------------+-------+--------+---------------+---------+---------+---------------------+---------+-------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+--------+---------------+---------+---------+---------------------+---------+-------------------------------------------------+ | 1 | SIMPLE | t3 | range | PRIMARY,lft | PRIMARY | 4 | NULL | 2585 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | t1 | ALL | PRIMARY | NULL | NULL | NULL | 6243855 | Using where; Using join buffer (flat, BNL join) | | 1 | SIMPLE | t2 | eq_ref | PRIMARY,lft | PRIMARY | 4 | slurm_acct_db.t1.id | 1 | Using where | +------+-------------+-------+--------+---------------+---------+---------+---------------------+---------+-------------------------------------------------+ 3 rows in set (0.00 sec) ``` David
(In reply to ARCTS Admins from comment #37) > We DO have EXPLAIN, though: > > ``` > MariaDB [slurm_acct_db]> EXPLAIN select ... No problem that you can't use analyze. I'm most interested in the total time of the query. Could you rerun it so we can get the timing of the long query like this (without the explain and discarding stdout so we just see the time)? >time mysql -D slurm_acct_db -e '<sql-query>' >/dev/null
(In reply to Chad Vizino from comment #38) > No problem that you can't use analyze. I'm most interested in the total time > of the query. Could you rerun it so we can get the timing of the long query > like this (without the explain and discarding stdout so we just see the > time)? > > >time mysql -D slurm_acct_db -e '<sql-query>' >/dev/null Sure thing! ``` [root@slurmdbd ~]# time mysql -D slurm_acct_db -e 'select t3.id_assoc, t1.id_tres, t1.time_start, t1.alloc_secs from greatlakes_assoc_usage_hour_table as t1, greatlakes_assoc_table as t2, greatlakes_assoc_table as t3 where (t1.time_start < 1625112000 && t1.time_start >= 1578330000) && t1.id=t2.id_assoc && (t3.id_assoc in (8737,23167,23135,23132,23113,23112,23110,23086,23085,23084,23083,23082,23059,22910,22882,22870,22869,22850,22849,22847,22846,22807,22796,22794,22793,22791,22790,22789,22788,22771,22755,22744,22713,22694,22690,22684,22672,22670,22668,22666,22659,22658,22629,22628,22627,22626,22625,22620,22607,22591,22573,22572,22570,22559,22555,22554,22553,22552,22551,22545,22539,22538,22534,22524,22505,22504,22499,22498,22497,22496,22483,22476,22459,22458,22456,22455,22454,22453,22452,22451,22450,22449,22442,22441,22438,22437,22433,22432,22431,22428,22427,22426,22425,22424,22423,22422,22413,22410,22409,22408,22405,22403,22402,22394,22393,22392,22391,22390,22383,22382,22381,22370,22366,22343,22341,22340,22330,22329,22324,22323,22322,22283,22282,22281,22273,22272,22164,22163,22162,22161,22153,22152,22151,22150,22149,22148,22139,22131,22130,21910,21901,21892,21890,21880,21870,21841,21834,21818,21786,21752,21751,21750,21709,21703,21698,21681,21680,21675,21674,21614,21606,21605,21604,21601,21570,21564,21561,21560,21551,21539,21505,21499,21498,21497,21496,21495,21494,21493,21491,21490,21467,21437,21432,21427,21423,21402,21388,21383,21365,21362,21302,21299,21297,21295,21294,21293,21288,21258,21230,21229,21228,21227,21225,21223,21220,21191,21190,21189,21188,21187,21186,21185,21174,21172,21171,21158,21157,21156,21155,21149,21148,21141,21137,21119,21118,21098,21097,21096,21091,21090,21073,21054,21053,21033,21029,21023,21019,21012,20978,20975,20930,20929,20928,20741,20701,20682,20664,20642,20562,20549,20548,20520,20359,20352,20338,20290,20278,20249,20245,20222,20221,20220,20217,20165,20159,20144,20130,20129,20128,20127,20126,20125,20085,20081,20072,20066,20065,20063,20062,20057,20056,20055,20054,20053,20047,20037,20014,20013,20012,20011,20010,19998,19997,19969,19921,19901,19890,19888,19885,19870,19869,19838,19837,19833,19819,19818,19811,19797,19796,19795,19794,19792,19788,19773,19771,19766,19765,19751,19743,19742,19715,19713,19705,19704,19703,19702,19698,19697,19696,19692,19691,19686,19685,19684,19676,19674,19669,19666,19659,19657,19651,19650,19649,19644,19643,19642,19637,19636,19635,19634,19633,19632,19619,19618,19617,19616,19615,19614,19613,19612,19608,19607,19600,19599,19363,19362,19361,19175,19174,19171,19163,19159,19158,19157,19156,19154,19153,19152,19151,19150,19149,19113,19112,19062,19061,19060,19059,19058,19057,19056,19055,19003,18996,18995,18988,18987,18986,18985,18948,18935,18932,18903,18898,18891,18890,18881,18880,18847,18846,18845,18840,18839,18812,18786,18784,18778,18774,18767,18763,18762,18761,18748,18747,18746,18745,18742,18724,18704,18693,18690,18689,18682,18681,18657,18647,18619,18618,18602,18601,18600,18588,18573,18546,18541,18538,18537,18533,18532,18516,18515,18514,18508,18507,18506,18503,18498,18488,18484,18483,18477,18469,18465,18441,18440,18439,18438,18429,18428,18412,18411,18345,18343,18340,18339,18335,18332,18261,18255,18254,18246,18220,18219,18218,18217,18214,18213,18212,18206,18205,18201,18197,18195,18182,18181,18180,18179,18178,18177,18173,18161,18156,18125,18095,18092,18090,18089,18088,18087,18086,18082,18079,18078,18077,18075,18074,18073,18066,18062,18061,18060,18059,18058,18057,18056,18055,18054,18053,18052,18051,18050,18049,18048,18047,18046,18015,18014,18012,18011,18010,18009,18008,18004,17989,17978,17977,17968,17958,17954,17951,17941,17940,17939,17933,17932,17930,17926,17841,17788,17787,17780,17777,17776,17772,17765,17755,17754,17741,17722,17721,17711,17710,17709,17708,17707,17706,17702,17662,17641,17637,17542,17537,17536,17363,17101,17100,17077,17068,17065,17019,17017,17016,17007,16997,16963,16962,16961,16952,16944,16928,16927,16905,16836,16835,16834,16814,16780,16730,16729,16712,16711,16710,16705,16676,16672,16629,16628,16612,16599,16590,16586,16583,16581,16536,16518,16517,16516,16509,16508,16507,16506,16505,16493,16461,16460,16459,16458,16449,16443,16441,16440,16439,16438,16437,16436,16435,16434,16433,16432,16431,16430,16429,16428,16427,16426,16425,16424,16423,16422,16421,16420,16419,16418,16417,16355,16345,16324,16311,16310,16290,16277,16236,16221,16154,16150,16104,16098,16097,16094,16093,16081,16079,16072,16068,16045,16043,16039,15996,15965,15924,15923,15921,15920,15913,15910,15901,15900,15899,15850,15835,15830,15815,15807,15804,15794,15793,15791,15781,15748,15730,15723,15708,15691,15673,15672,15671,15669,15657,15634,15610,15609,15606,15566,15565,15564,15563,15562,15561,15560,15559,15558,15557,15556,15537,15462,15443,15442,15441,15436,15435,15387,15384,15383,15382,15379,15349,15348,15319,15318,15314,15310,15272,15267,15266,15250,15243,15235,15225,15224,15135,15134,15129,15128,15127,15126,15125,15116,15113,15095,15074,15072,15067,15065,15064,15063,15062,15026,14903,14902,14901,14900,14895,14894,14893,14707,14667,14652,14648,14640,14639,14629,14628,14627,14626,14625,14624,14623,14622,14562,14520,14505,14498,14497,14476,14475,14456,14448,14447,14446,14445,14444,14443,14442,14441,14440,14438,14437,14436,14435,14434,14378,14267,14265,14264,14263,14262,14261,14260,14259,14258,14257,14256,14221,14207,14128,14127,14018,14011,14007,13973,13926,13919,13189,13161,13106,13096,13095,13094,13093,13092,13091,13090,13089,13088,13087,13086,13085,13084,13083,13082,13081,13080,13079,13078,13077,13076,13075,13074,13073,13072,13071,13070,13069,13068,13067,13066,13065,13064,13063,13062,13061,13060,13059,13058,13057,13056,13055,13054,13053,13052,13051,13050,13049,13048,13047,13046,13045,13044,13043,13042,13041,13040,13039,13038,13037,12963,12712,12206,10473,10422,10421,10412,10411,10410,10409,10408,10407,10406,10405,10404,10403,10402,10401,10400,10399,10398,10397,10396,10395,10394,10393,10392,10391,10390,10389,10388,10387,10386,10385,10384,10383,10382,10381,10380,10379,10378,10377,10376,10375,10374,10373,10372,10371,10370,10369,10368,10367,10366,10365,10364,10363,10362,10361,10360,10359,10358,10357,10356,10355,10354,10353,10352,10351,10350,10349,10348,10347,10346,10345,10344,10343,10342,10341,10340,10339,10338,10337,10336,10335,10334,10333,10332,10331,10330,10329,10328,10327,10326,10325,10324,10323,10322,10321,10320,10319,10318,10317,10316,10315,10314,10313,10312,10311,10310,10309,10308,10307,10306,10305,10304,10303,10302,10301,10300,10299,10298,10297,10296,10295,10294,10293,10292,10291,10290,10289,10288,10287,10286,10285,10284,10283,10282,10281,10280,10279,10278,10277,10276,10275,10274,10273,10272,10271,10270,10269,10268,10267,10266,10265,10264,10263,10262,10261,10260,10259,10258,10257,10256,10255,10254,10253,10252,10251,10250,10249,10248,10247,10246,10245,10244,10243,10242,10241,10240,10239,10238,10237,10236,10235,10234,10233,10232,10231,10230,10229,10228,10227,10226,10225,10224,10223,10222,10221,10220,10219,10218,10217,10216,10215,10214,10213,10212,10211,10210,10209,10208,10207,10206,10205,10204,10203,10202,10201,10200,10199,10198,10197,10196,10195,10194,10193,10192,10191,10190,10189,10188,10187,10186,10185,10184,10183,10182,10181,10180,10179,10178,10177,10176,10175,10174,10173,10172,10171,10170,10169,10168,10167,10166,10165,10164,10163,10162,10161,10160,10159,10158,10157,10156,10155,10154,10153,10152,10151,10150,10149,10148,10147,10146,10145,10144,10143,10142,10141,10140,10139,10138,10137,10136,10135,10134,10133,10132,10131,10130,10129,10128,10127,10126,10125,10124,10123,10122,10121,10120,10119,10118,10117,10116,10115,10114,10113,10112,10111,10110,10109,10108,10107,10106,10105,10104,10103,10102,10101,10100,10099,10098,10097,10096,10095,10094,10093,10092,10091,10090,10089,10088,10087,10086,10085,10084,10083,10082,10081,10080,10079,10078,10077,10076,10075,10074,10073,10072,10071,10070,10069,10068,10067,10066,10065,10064,10063,10062,10061,10060,10059,10058,10057,10056,10055,10054,10053,10052,10051,10050,10049,10048,10047,10046,10045,10044,10043,10042,10041,10040,10039,10038,10037,10036,10035,10034,10033,10032,10031,10030,10029,10028,10027,10026,10025,10024,10023,10022,10021,10020,10019,10018,10017,10016,10015,10014,10013,10012,10011,10010,10009,10008,10007,10006,10005,10004,10003,10002,10001,10000,9999,9998,9997,9996,9995,9994,9993,9992,9991,9990,9989,9988,9987,9986,9985,9984,9983,9982,9981,9980,9979,9978,9977,9976,9975,9974,9973,9972,9971,9970,9969,9968,9967,9966,9965,9964,9963,9962,9961,9960,9959,9958,9957,9956,9955,9954,9953,9952,9951,9950,9949,9948,9947,9946,9945,9944,9943,9942,9941,9940,9939,9938,9937,9936,9935,9934,9933,9932,9931,9930,9929,9928,9927,9926,9925,9924,9923,9922,9921,9920,9919,9918,9917,9916,9915,9914,9913,9912,9911,9910,9909,9908,9907,9906,9905,9904,9903,9902,9901,9900,9899,9898,9897,9896,9895,9894,9893,9892,9891,9890,9889,9888,9887,9886,9885,9884,9883,9882,9881,9880,9879,9878,9877,9876,9875,9874,9873,9872,9871,9870,9869,9868,9867,9866,9865,9864,9863,9862,9861,9860,9859,9858,9857,9856,9855,9854,9853,9852,9851,9850,9849,9848,9847,9846,9845,9844,9843,9842,9841,9840,9839,9838,9837,9836,9835,9834,9833,9832,9831,9830,9829,9828,9827,9826,9825,9824,9823,9822,9821,9820,9819,9818,9817,9816,9815,9814,9813,9812,9811,9810,9809,9808,9807,9806,9805,9804,9803,9802,9801,9800,9799,9798,9797,9796,9795,9794,9793,9792,9791,9790,9789,9788,9787,9786,9785,9784,9783,9782,9781,9780,9779,9778,9777,9776,9775,9774,9773,9772,9771,9770,9769,9768,9767,9766,9765,9764,9763,9762,9761,9760,9759,9758,9757,9756,9755,9754,9753,9752,9751,9750,9749,9748,9747,9746,9745,9744,9743,9742,9741,9740,9739,9738,9737,9736,9735,9734,9733,9732,9731,9730,9729,9728,9727,9726,9725,9724,9723,9722,9721,9720,9719,9718,9717,9716,9715,9714,9713,9712,9711,9710,9709,9708,9707,9706,9705,9704,9703,9702,9701,9700,9699,9698,9697,9696,9695,9694,9693,9692,9691,9690,9689,9688,9687,9686,9685,9684,9683,9682,9681,9680,9679,9678,9677,9676,9675,9674,9673,9672,9671,9670,9669,9668,9667,9666,9665,9664,9663,9662,9661,9660,9659,9658,9657,9656,9655,9654,9653,9652,9651,9650,9649,9648,9647,9646,9645,9644,9643,9642,9641,9640,9639,9638,9637,9636,9635,9634,9633,9632,9631,9630,9629,9628,9627,9626,9625,9624,9623,9622,9621,9620,9619,9618,9617,9616,9615,9614,9613,9612,9611,9610,9609,9608,9607,9606,9605,9604,9603,9602,9601,9600,9599,9598,9597,9596,9595,9594,9593,9592,9591,9590,9589,9588,9587,9586,9585,9584,9583,9582,9581,9580,9579,9578,9577,9576,9575,9574,9573,9572,9571,9570,9569,9568,9567,9566,9565,9564,9563,9562,9561,9560,9559,9558,9557,9556,9555,9554,9553,9552,9551,9550,9549,9548,9547,9546,9545,9544,9543,9542,9541,9540,9539,9538,9537,9536,9535,9534,9533,9532,9531,9530,9529,9528,9527,9526,9525,9524,9523,9522,9521,9520,9519,9518,9517,9516,9515,9514,9513,9512,9511,9510,9509,9508,9507,9506,9505,9504,9503,9502,9501,9500,9499,9498,9497,9496,9495,9494,9493,9492,9491,9490,9489,9488,9487,9486,9485,9484,9483,9482,9481,9480,9479,9478,9477,9476,9475,9474,9473,9472,9471,9470,9469,9468,9467,9466,9465,9464,9463,9462,9461,9460,9459,9458,9457,9456,9455,9454,9453,9452,9451,9450,9449,9448,9447,9446,9445,9444,9443,9442,9441,9440,9439,9438,9437,9436,9435,9434,9433,9432,9431,9430,9429,9428,9427,9426,9425,9424,9423,9422,9421,9420,9419,9418,9417,9416,9415,9414,9413,9412,9411,9410,9409,9408,9407,9406,9405,9404,9403,9402,9401,9400,9399,9398,9397,9396,9395,9394,9393,9392,9391,9390,9389,9388,9387,9386,9385,9384,9383,9382,9381,9380,9379,9378,9377,9376,9375,9374,9373,9372,9371,9370,9369,9368,9367,9366,9365,9364,9363,9362,9361,9360,9359,9358,9357,9356,9355,9354,9353,9352,9351,9350,9349,9348,9347,9346,9345,9344,9343,9342,9341,9340,9339,9338,9337,9336,9335,9334,9333,9332,9331,9330,9329,9328,9327,9326,9325,9324,9323,9322,9321,9320,9319,9318,9317,9316,9315,9314,9313,9312,9311,9310,9309,9308,9307,9306,9305,9304,9303,9302,9301,9300,9299,9298,9297,9296,9295,9294,9293,9292,9291,9290,9289,9288,9287,9286,9285,9284,9283,9282,9281,9280,9279,9278,9277,9276,9275,9274,9273,9272,9271,9270,9269,9268,9267,9266,9265,9264,9263,9262,9261,9260,9259,9258,9257,9256,9255,9254,9253,9252,9251,9250,9249,9248,9247,9246,9245,9244,9243,9242,9241,9240,9239,9238,9237,9236,9235,9234,9233,9232,9231,9230,9229,9228,9227,9226,9225,9224,9223,9222,9221,9220,9219,9218,9217,9216,9215,9214,9213,9212,9211,9210,9209,9208,9207,9206,9205,9204,9203,9202,9201,9200,9199,9198,9197,9196,9195,9194,9193,9192,9191,9190,9189,9188,9187,9186,9185,9184,9183,9182,9181,9180,9179,9178,9177,9176,9175,9174,9173,9172,9171,9170,9169,9168,9167,9166,9165,9164,9163,9162,9161,9160,9159,9158,9157,9156,9155,9154,9153,9152,9151,9150,9149,9148,9147,9146,9145,9144,9143,9142,9141,9140,9139,9138,9137,9136,9135,9134,9133,9132,9131,9130,9129,9128,9127,9126,9125,9124,9123,9122,9121,9120,9119,9118,9117,9116,9115,9114,9113,9112,9111,9110,9109,9108,9107,9106,9105,9104,9103,9102,9101,9100,9099,9098,9097,9096,9095,9094,9093,9092,9091,9090,9089,9088,9087,9086,9085,9084,9083,9082,9081,9080,9079,9078,9077,9076,9075,9074,9073,9072,9071,9070,9069,9068,9067,9066,9065,9064,9063,9062,9061,9060,9059,9058,9057,9056,9055,9054,9053,9052,9051,9050,9049,9048,9047,9046,9045,9044,9043,9042,9041,9040,9039,9038,9037,9036,9035,9034,9033,9032,9031,9030,9029,9028,9027,9026,9025,9024,9023,9022,9021,9020,9019,9018,9017,9016,9015,9014,9013,9012,9011,9010,9009,9008,9007,9006,9005,9004,9003,9002,9001,9000,8999,8998,8997,8996,8995,8994,8993,8992,8991,8990,8989,8988,8987,8986,8985,8984,8983,8982,8981,8980,8979,8978,8977,8976,8975,8974,8973,8972,8971,8970,8969,8968,8967,8966,8965,8964,8963,8962,8961,8960,8959,8958,8957,8956,8955,8954,8953,8952,8951,8950,8949,8948,8947,8946,8945,8944,8943,8942,8941,8940,8939,8938,8937,8936,8935,8934,8933,8932,8931,8930,8929,8928,8927,8926,8925,8924,8923,8922,8921,8920,8919,8918,8917,8916,8915,8914,8913,8912,8911,8910,8909,8908,8907,8906,8905,8904,8903,8902,8901,8900,8899,8898,8897,8896,8895,8894,8893,8892,8891,8890,8889,8888,8887,8886,8885,8884,8883,8882,8881,8880,8879,8878,8877,8876,8875,8874,8873,8872,8871,8870,8869,8868,8867,8866,8865,8864,8863,8862,8861,8860,8859,8858,8857,8856,8855,8854,8853,8852,8851,8850,8849,8848,8847,8846,8845,8844,8843,8842,8841,8840,8839,8838,8837,8836,8835,8834,8833,8832,8831,8830,8829,8828,8827,8826,8825,8824,8823,8822,8821,8820,8819,8818,8817,8816,8815,8814,8813,8812,8811,8810,8809,8808,8807,8806,8805,8804,8803,8802,8801,8800,8799,8798,8797,8796,8795,8794,8793,8792,8791,8790,8789,8788,8787,8786,8785,8784,8783,8782,8781,8780,8779,8778,8777,8776,8775,8774,8773,8772,8771,8770,8769,8768,8767,8766,8765,8764,8763,8762,8761,8760,8759,8758,8757,8756,8755,8754,8753,8752,8751,8750,8749,8748,8747,8746,8745,8744,8743,8742,8741,8740,8739,8738)) && t2.lft between t3.lft and t3.rgt order by t3.id_assoc, time_start; > ' > /dev/null real 28m55.206s user 0m0.262s sys 0m0.033s ```
(In reply to ARCTS Admins from comment #39) > (In reply to Chad Vizino from comment #38) > > >time mysql -D slurm_acct_db -e '<sql-query>' >/dev/null > ``` > [root@slurmdbd ~]# time mysql -D slurm_acct_db -e 'select t3.id_assoc, ... > real 28m55.206s > user 0m0.262s > sys 0m0.033s > ``` Ok. So we can see that the query took nearly 29 minutes! Slurm's definitely going to have trouble with that. It seems at this point that tuning your MariaDB config is probably what you'll need to do by changing parameters and/or moving the server to a beefier machine to get that time down. Concerning tuning this page has some helpful info and suggest you review it: https://slurm.schedmd.com/accounting.html Make sure InnoDB is enabled and that you have tuned values. I think older versions of MariaDB use XtraDB: https://mariadb.com/kb/en/innodb/ I don't know how that compares to InnoDB but you may want to look into it and do some tests to compare timings to see what works best. Take a look at your my.cnf for related values. This can help: https://bugs.schedmd.com/show_bug.cgi?id=2457#c1 With any changes you choose to make, you can use that 29min query as a benchmark to see if they are helping. If they don't then you may want to consider running the database server on a separate machine if it's not already. As a last resort, you may want to consider upgrading your hardware/filesystems for the db system (assuming you want to keep Slurm tables of the size you want; otherwise you can keep them trimmed (first part of ticket)).
(In reply to Chad Vizino from comment #40) > Ok. So we can see that the query took nearly 29 minutes! Slurm's definitely > going to have trouble with that. It seems at this point that tuning your > MariaDB config is probably what you'll need to do by changing parameters > and/or moving the server to a beefier machine to get that time down. This machine is _relatively_ chunky. It's got 187G mem and 12 Skylake processors. And it's a standalone physical machine; nothing else runs on this box. > Concerning tuning this page has some helpful info and suggest you review it: > > https://slurm.schedmd.com/accounting.html > > Make sure InnoDB is enabled and that you have tuned values. I think older > versions of MariaDB use XtraDB: > > https://mariadb.com/kb/en/innodb/ > > I don't know how that compares to InnoDB but you may want to look into it > and do some tests to compare timings to see what works best. > > Take a look at your my.cnf for related values. This can help: > > https://bugs.schedmd.com/show_bug.cgi?id=2457#c1 Thanks for these, especially the ticket. That remark from Tim: ``` innodb_buffer_pool_size can have a huge impact - we'd recommend setting this as high as half the RAM available on the slurmdbd server. You can check the current setting in MySQL like so ``` is gold alone! It looks like innodb_buffer_pool_size is only mentioned in that ticket. Any other values worth tweaking? I *think* we took the example on the aforementioned accounting.html page as a suggestion because our current values in `my.cnf` are: ``` [root@slurmdbd ~]# cat /etc/my.cnf [mysqld] # BEGIN ANSIBLE MANAGED BLOCK innodb_buffer_pool_size=1024M innodb_log_file_size=64M innodb_lock_wait_timeout=900 # END ANSIBLE MANAGED BLOCK ... ``` > With any changes you choose to make, you can use that 29min query as a > benchmark to see if they are helping. If they don't then you may want to > consider running the database server on a separate machine if it's not > already. As a last resort, you may want to consider upgrading your > hardware/filesystems for the db system (assuming you want to keep Slurm > tables of the size you want; otherwise you can keep them trimmed (first part > of ticket)). Should we decide to go to the latest (or newer) MariaDB, would SchedMD be able to support us in that? I did look at our ROW_FORMAT for the `slurm_acct_db` and we ARE using COMPACT, instead of DYNAMIC (which makes sense, given our version of MariaDB). However, the aforementioned accounting.html page mentions if we get an error due to that (which we would, in this case), that "the table can then be altered (may take some time) to set the row format to DYNAMIC in order to allow the conversion to proceed". So, it _seems_ like we could upgrade MariaDB and not have slurmdbd data impacted. Does that sound right? David
(In reply to ARCTS Admins from comment #41) > Should we decide to go to the latest (or newer) MariaDB, would SchedMD be > able to support us in that? I did look at our ROW_FORMAT for the > `slurm_acct_db` and we ARE using COMPACT, instead of DYNAMIC (which makes > sense, given our version of MariaDB). However, the aforementioned > accounting.html page mentions if we get an error due to that (which we > would, in this case), that "the table can then be altered (may take some > time) to set the row format to DYNAMIC in order to allow the conversion to > proceed". So, it _seems_ like we could upgrade MariaDB and not have slurmdbd > data impacted. Does that sound right? Yes, you can upgrade to the latest version being mindful of the row format info on the accounting page. Obviously, make a backup beforehand to be safe. :-)
If you're ok with this, I'd like to close the ticket since we've narrowed it down to a db issue. If you'd continue to have related issues with Slurm we can pursue via a new ticket.
Closing. Again, feel free to open a ticket if you continue to have problems with Slurm.