| Summary: | "Message too long" during COMMIT | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Kilian Cavalotti <kilian> |
| Component: | slurmdbd | Assignee: | Alejandro Sanchez <alex> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | akkornel, alex, brian |
| Version: | 18.08.0 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Stanford | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
Hi, We tried starting slurmdbd again, and it failed the same way, in the same place, after more than 2 hours (re-)adding records in the <cluster>_step_table. I'm not sure why the "CREATE TABLE" query stalls and fails when executed by slurmdbd, while it succeeds when run in the MariaDB CLI. For background info, we use MariaDB in a Galera multi-node cluster. This info could certainly be of interest: https://www.percona.com/blog/2015/10/26/how-big-can-your-galera-transactions-be/ and seems to be pointing towards excessively large transactions. Maybe there could be a way to split them up in smaller chunks? Anyway, this is currently preventing our cluster to be updated to 18.08.0. We currently have a scheduled datacenter downtime until Monday afternoon, so we can't try anything right now, but it would be great to have a plan to be able to move forward with the upgrade when the power comes back on Monday. Any guidance will be much appreciated. Thanks! -- Kilian Hi Kilian. I've still not looked at this much, but as a starting point could you please attach full slurmdbd.log and check your current wsrep_max_ws_size option value? Hi Alejandro, (In reply to Alejandro Sanchez from comment #2) > Hi Kilian. I've still not looked at this much, but as a starting point could > you please attach full slurmdbd.log and check your current wsrep_max_ws_size > option value? I don't have the log handy right now since the cluster is down for a datacenter maintenance, but wsrep_max_ws_size is not set, which makes it take the default and maximum value of 2GB, I believe. Cheers, -- Kilian > (In reply to Alejandro Sanchez from comment #2) > > Hi Kilian. I've still not looked at this much, but as a starting point could > > you please attach full slurmdbd.log and check your current wsrep_max_ws_size > > option value? > > I don't have the log handy right now since the cluster is down for a > datacenter maintenance, Here's the slurmdbd log, which covers the first start with 18.08.0: Aug 31 13:55:38 sh-sl01 yum[170787]: Updated: slurm-slurmdbd-18.08.0-1.el7.x86_64 Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: error: Database settings not recommended values: innodb_buffer_pool_size Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: pre-converting step table for sherlock Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_in_ave after tres_alloc in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_in_max after tres_usage_in_ave in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_in_max_taskid after tres_usage_in_max in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_in_max_nodeid after tres_usage_in_max_taskid in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_in_min after tres_usage_in_max_nodeid in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_in_min_taskid after tres_usage_in_min in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_in_min_nodeid after tres_usage_in_min_taskid in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_in_tot after tres_usage_in_min_nodeid in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_out_ave after tres_usage_in_tot in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_out_max after tres_usage_out_ave in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_out_max_taskid after tres_usage_out_max in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_out_max_nodeid after tres_usage_out_max_taskid in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_out_min after tres_usage_out_max_nodeid in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_out_min_taskid after tres_usage_out_min in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_out_min_nodeid after tres_usage_out_min_taskid in table "sherlock_step_table" Aug 31 13:56:03 sh-sl01 slurmdbd[170926]: adding column tres_usage_out_tot after tres_usage_out_min_nodeid in table "sherlock_step_table" Aug 31 16:56:46 sh-sl01 slurmdbd[170926]: pre-converting job table for sherlock Aug 31 17:28:42 sh-sl01 slurmdbd[170926]: error: mysql_query failed: 1180 Got error 90 "Message too long" during COMMIT#012create table if not exists table_defs_table (creation_time int unsigned not null, mod_time int unsigned default 0 not null, table_name text not null, definition text not null, primary key (table_name(50))) engine='innodb' Aug 31 17:28:42 sh-sl01 slurmdbd[170926]: error: Couldn't load specified plugin name for accounting_storage/mysql: Plugin init() callback failed Aug 31 17:28:42 sh-sl01 slurmdbd[170926]: error: cannot create accounting_storage context for accounting_storage/mysql Aug 31 17:28:42 sh-sl01 slurmdbd[170926]: fatal: Unable to initialize accounting_storage/mysql accounting storage plugin Aug 31 17:28:42 sh-sl01 systemd: slurmdbd.service: main process exited, code=exited, status=1/FAILURE Aug 31 17:28:42 sh-sl01 systemd: Unit slurmdbd.service entered failed state. Aug 31 17:28:42 sh-sl01 systemd: slurmdbd.service failed. Aug 31 17:47:29 sh-sl01 systemd: PID file /var/run/slurmdbd.pid not readable (yet?) after start. Aug 31 17:47:29 sh-sl01 slurmdbd[171721]: error: Database settings not recommended values: innodb_buffer_pool_size Aug 31 17:47:29 sh-sl01 slurmdbd[171721]: pre-converting step table for sherlock Aug 31 17:52:10 sh-sl01 systemd: PID file /var/run/slurmdbd.pid not readable (yet?) after start. Aug 31 17:52:10 sh-sl01 slurmdbd[171888]: error: Database settings not recommended values: innodb_buffer_pool_size Aug 31 17:52:10 sh-sl01 slurmdbd[171888]: pre-converting step table for sherlock Aug 31 18:01:30 sh-sl01 slurmdbd[171888]: error: Could not execute statement 1213 Deadlock found when trying to get lock; try restarting transaction Aug 31 19:51:52 sh-sl01 slurmdbd[171888]: pre-converting job table for sherlock Aug 31 20:07:01 sh-sl01 slurmdbd[171888]: error: mysql_query failed: 2013 Lost connection to MySQL server during query#012create table if not exists table_defs_table (creation_time int unsigned not null, mod_time int unsigned default 0 not null, table_name text not null, definition text not null, primary key (table_name(50))) engine='innodb' Aug 31 20:07:01 sh-sl01 slurmdbd[171888]: Accounting storage MYSQL plugin failed > but wsrep_max_ws_size is not set, which makes it > take the default and maximum value of 2GB, I believe. I confirm: MariaDB [slurm_acct_db]> SHOW VARIABLES LIKE 'wsrep_max_ws_size'; +-------------------+------------+ | Variable_name | Value | +-------------------+------------+ | wsrep_max_ws_size | 2147483647 | +-------------------+------------+ Tried again, with a single node in the Galera cluster to avoid issues related to replication, and it still fails in the same place: mariadb logs this when creating the table: Sep 3 17:45:54 sh-sl01 mysqld: 2018-09-03 17:45:54 139828804291328 [Warning] WSREP: transaction size limit (2147483647) exceeded: 2147483648 Sep 3 17:45:55 sh-sl01 mysqld: 2018-09-03 17:45:55 139828804291328 [ERROR] WSREP: rbr write fail, data_len: 0, 2 Any idea how to bypass this limit, or slice the transaction in smaller chunks? This is currently holding off our ability to restore our system to production after the maintenance. Thanks, -- Kilian (In reply to Kilian Cavalotti from comment #5) > This is currently holding off our ability to restore our system to > production after the maintenance. We're moving back to 17.11.8. I'm tempted to suggest setting the following values which seem to alleviate such error for other people: mysql> set global wsrep_max_ws_size=4294901759; mysql> set global binlog_row_image=minimal; Although I'm still studying the conversion logic. (In reply to Alejandro Sanchez from comment #7) > I'm tempted to suggest setting the following values which seem to alleviate > such error for other people: > > mysql> set global wsrep_max_ws_size=4294901759; > mysql> set global binlog_row_image=minimal; > > Although I'm still studying the conversion logic. Thanks. I tried to set binlog_row_image=minimal yesterday before our last conversion try, but it didn't help. And wsrep_max_ws_size maximum value is 2GB, apparently: https://mariadb.com/kb/en/library/galera-cluster-system-variables/#wsrep_max_ws_size Thanks, -- Kilian Hey Killian, would it be possible to share the dump of the database so that we could try it locally? I can send you a link to place that you can upload it too if that’s something that you can do. Thanks, Brian Thanks Killian. I'm able to download it. I've made the comment with the link private so that it's not public to the world. We'll let you know how it goes. Your database copy is still being imported locally... Although unrelated and while here, you might consider increasing how often you purge job and/or steps. From the last sherlock slurmdbd.conf copy we have PurgeJobAfter=36month PurgeStepAfter=36month (In reply to Alejandro Sanchez from comment #12) > Your database copy is still being imported locally... Although unrelated and > while here, you might consider increasing how often you purge job and/or > steps. From the last sherlock slurmdbd.conf copy we have > > PurgeJobAfter=36month > PurgeStepAfter=36month Yup, I was considering this and thought we had everything set to 12 months, but apparently not. I'll make the change, thanks for pointing it out. Cheers, -- Kilian Kilian, will you show the value of these MariaDB options? Thanks. MariaDB [(none)]> show variables like 'innodb_buffer_pool_size'; MariaDB [(none)]> show variables like 'innodb_log_file_size'; MariaDB [(none)]> show variables like 'innodb_lock_wait_timeout'; Hi Alejandro, (In reply to Alejandro Sanchez from comment #18) > Kilian, will you show the value of these MariaDB options? Thanks. Sure, here they are: MariaDB [slurm_acct_db]> show variables like 'innodb_buffer_pool_size'; +-------------------------+-------------+ | Variable_name | Value | +-------------------------+-------------+ | innodb_buffer_pool_size | 34359738368 | +-------------------------+-------------+ 1 row in set (0.00 sec) MariaDB [slurm_acct_db]> show variables like 'innodb_log_file_size'; +----------------------+----------+ | Variable_name | Value | +----------------------+----------+ | innodb_log_file_size | 67108864 | +----------------------+----------+ 1 row in set (0.00 sec) MariaDB [slurm_acct_db]> show variables like 'innodb_lock_wait_timeout'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | innodb_lock_wait_timeout | 900 | +--------------------------+-------+ 1 row in set (0.00 sec) Cheers, -- Kilian Hi Kilian. While we're finally performing the upgrade with your database in a machine with enough memory, we're guessing that if you change the Purge at an hourly rate to: PurgeStepAfter=8760hours PurgeJobAfter=8760hours you'll hopefully be able to successfully finish the process. I any case this afternoon we should have results from the upgrade on the copy of your database. Hi Alejandro, (In reply to Alejandro Sanchez from comment #22) > Hi Kilian. While we're finally performing the upgrade with your database in > a machine with enough memory, we're guessing that if you change the Purge at > an hourly rate to: > > PurgeStepAfter=8760hours > PurgeJobAfter=8760hours Uh, interesting. I guess I'm not familiar enough with the purging process to understand the difference between setting those values to 8760 hours and 12 months: they're the same duration so what does it change? The frequency at which those purges are done? Am I right to assume that having PurgeStepAfter=8760hours will make slurmdbd purge 1-hour worth of jobs older than 12 months every hour, while having it at 12month will make it purge 1-month of old jobs every month? > you'll hopefully be able to successfully finish the process. I any case this > afternoon we should have results from the upgrade on the copy of your > database. Great, thanks! Cheers, -- Kilian (In reply to Kilian Cavalotti from comment #24) > Uh, interesting. I guess I'm not familiar enough with the purging process to > understand the difference between setting those values to 8760 hours and 12 > months: they're the same duration so what does it change? The frequency at > which those purges are done? Correct. > Am I right to assume that having PurgeStepAfter=8760hours will make slurmdbd > purge 1-hour worth of jobs older than 12 months every hour, while having it > at 12month will make it purge 1-month of old jobs every month? PurgeStepAfter=8760hours will make the purge process be attempted at the beginning of every hour, and all steps older than 8760hours (equivalent to 12months) will be purged. PurgeStepAfter=12month will make the purge process be attempted at the beginning of every month, and all steps older than 12moths will be purged. As stated in slurmdbd.conf man page: The purge takes place at the start of each purge interval. For example, if the purge time is 2 months, the purge would happen at the beginning of each month. (In reply to Alejandro Sanchez from comment #25) > As stated in slurmdbd.conf man page: > > The purge takes place at the start of each purge interval. For example, if > the purge time is 2 months, the purge would happen at the beginning of each > month. Got it, thanks for the clarification. Change applied! Cheers, -- Kilian Hi there, I was wondering if you had any chance to reproduce the problem with the dump of our database? Thanks, -- Kilian I got some unrelated errors: alex@knl2:~/slurm/upgrade/knl2/log$ grep error slurmdbd.log [2018-09-07T06:10:10.559] error: Database settings not recommended values: innodb_buffer_pool_size innodb_lock_wait_timeout [2018-09-07T07:25:39.347] error: Could not execute statement 1206 The total number of locks exceeds the lock table size [2018-09-07T08:31:31.572] error: Could not execute statement 1206 The total number of locks exceeds the lock table size [2018-09-07T09:41:30.887] error: Could not execute statement 1206 The total number of locks exceeds the lock table size [2018-09-07T10:48:48.888] error: Could not execute statement 1206 The total number of locks exceeds the lock table size alex@knl2:~/slurm/upgrade/knl2/log$ tail -n2 slurmdbd.log [2018-09-10T04:00:00.408] debug2: Got 1 of 1 rolled up [2018-09-10T04:00:00.408] debug2: Everything rolled up alex@knl2:~/slurm/upgrade/knl2/log$ But the upgrade finished: alex@knl2:~/slurm/upgrade/knl2/log$ grep -ri conversion slurmdbd.log [2018-09-07T11:35:08.681] Conversion done: success! We thin when we manage to purge some of your job/steps properly you will be able to upgrade. Unfortunately, we also have problems with your purge process as tracked in bug 5689. (In reply to Alejandro Sanchez from comment #30) > I got some unrelated errors: > But the upgrade finished: > > We thin when we manage to purge some of your job/steps properly you will be > able to upgrade. Unfortunately, we also have problems with your purge > process as tracked in bug 5689. Mmmh, so I finally managed to import a dump of our thinned-out database (down to 6 months) in a test environment and tried the upgrade again. I unfortunately got the exact same error after 4h30 of DB conversion: slurmdbd: pre-converting job table for sherlock slurmdbd: error: mysql_query failed: 1180 Got error 90 "Message too long" during COMMIT create table if not exists table_defs_table (creation_time int unsigned not null, mod_time int unsigned default 0 not null, table_name text not null, definition text not null, primary key (table_name(50))) engine='innodb' slurmdbd: Accounting storage MYSQL plugin failed slurmdbd: error: Couldn't load specified plugin name for accounting_storage/mysql: Plugin init() callback failed slurmdbd: error: cannot create accounting_storage context for accounting_storage/mysql slurmdbd: fatal: Unable to initialize accounting_storage/mysql accounting storage plugin Not sure how to proceed here... Cheers, -- Kilian Is the test environment also backed up by a Galera cluster? I'm starting to think if using Galera vs not is making any difference since we managed to upgrade your database copy locally... (In reply to Alejandro Sanchez from comment #32) > Is the test environment also backed up by a Galera cluster? I'm starting to > think if using Galera vs not is making any difference since we managed to > upgrade your database copy locally... Yes it is, I wanted to make the test environment as similar to the production system as possible. And yes, I also think that the Galera layer is significant in this case. I will try to start the DB conversion without the Galera component, and if it works, we can envision a workaround procedure where we stop the slurmdbds, restart MariaDB without Galera, start slurmdbd to convert the DB, and then, when it's done, shutdown slurmdbd agin, re-enable Galera and restart everything. But we would obviously prefer to understand why the DB conversion is failing with Galera, and possibly fix the problem. Because I think that having a Galera cluster as an underlying DB layer for slurmdbd would not be an uncommon case. Thanks, -- Kilian Hey Kilian. Did you manage to proceed with the conversion without Galera? I'm lowering a bit the severity to 3 if that's not an issue. Hi Alejandro, (In reply to Alejandro Sanchez from comment #34) > Hey Kilian. Did you manage to proceed with the conversion without Galera? Not yet, unfortunately, I've been a bit distracted by a lot of other things, but I'll give it another go today. And I'll keep you posted. > I'm lowering a bit the severity to 3 if that's not an issue. Absolutely, since we're staying with 17.11 for the time being, this is perfectly fine. Thanks! -- Kilian (In reply to Kilian Cavalotti from comment #35) > > Hey Kilian. Did you manage to proceed with the conversion without Galera? > > Not yet, unfortunately, I've been a bit distracted by a lot of other things, > but I'll give it another go today. And I'll keep you posted. I finally got to it, and I can confirm that running MariaDB without the Glaera components allowed the update to proceed to completion: slurmdbd: debug2: running rollup at Thu Oct 04 17:00:00 2018 slurmdbd: debug2: No need to roll cluster sherlock this day 1538636400 <= 1538636400 slurmdbd: debug2: No need to roll cluster sherlock this month 1538377200 <= 1538377200 slurmdbd: debug2: Got 1 of 1 rolled up slurmdbd: debug2: Everything rolled up So I guess we now have an upgrade path, but it would still be great to understand why the transaction fail with Galera. Cheers, -- Kilian (In reply to Kilian Cavalotti from comment #36) > I finally got to it, and I can confirm that running MariaDB without the > Glaera components allowed the update to proceed to completion: > > slurmdbd: debug2: running rollup at Thu Oct 04 17:00:00 2018 > slurmdbd: debug2: No need to roll cluster sherlock this day 1538636400 <= > 1538636400 > slurmdbd: debug2: No need to roll cluster sherlock this month 1538377200 <= > 1538377200 > slurmdbd: debug2: Got 1 of 1 rolled up > slurmdbd: debug2: Everything rolled up Sounds promising. I guess you're gonna go try the upgrade on production cluster by using the strategy you mentioned. > So I guess we now have an upgrade path, but it would still be great to > understand why the transaction fail with Galera. I'm gonna try install this Galera cluster and retry the upgrade with your db copy again with it and see if I can reproduce and find the potential offending parameter or issue. Lowering the severity to 4 now since I expect your production upgrade to hopefully success. Actually, may I know which MariaDB/Galera versions are you using? Also the my.cnf with worded sensitive options might be useful since there are a bunch of potential Galera-specific params that can be decisive to trigger the error. At least these: MariaDB [(none)]> show status like 'wsrep%'; + The value of: binlog_format default_storage_engine innodb_autoinc_lock_mode innodb_doublewrite wsrep_on innodb_flush_log_at_trx_commit Also not sure if you can retrieve the MariaDB logs (regular and/or error ones) to see the specific message when it failed, like: [ERROR] WSREP: Maximum wirteset size exceeded by XXXX: 90 (Message too long) or similar that could point to anything relevant. In any case, this looks more like a DB Admin / Galera issue than a Slurm one. Hi Alejandro, We're using MariaDB 10.1.36, and Galera 25.3.23 (In reply to Alejandro Sanchez from comment #39) > Also the my.cnf with worded sensitive options might be useful since there > are a bunch of potential Galera-specific params that can be decisive to > trigger the error. Here are the options we have in our config file: -- 8< ------------------------------------------------- # cat /etc/my.cnf.d/galera.cnf [mysqld] bind-address=0.0.0.0 binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 innodb_buffer_pool_size=2GB innodb_buffer_pool_size=128M innodb_lock_wait_timeout=900 tmpdir=/var/lib/mysql/.tmp # Galera Provider Configuration wsrep_on=ON wsrep_provider=/usr/lib64/galera/libgalera_smm.so wsrep_cluster_name="test_slurm_cluster" wsrep_cluster_address="gcomm://sherlock-slurm01,sherlock-slurm02" wsrep_sst_method=rsync wsrep_node_address="10.210.31.20" wsrep_node_name="sherlock-slurm01" wsrep_max_ws_size=4294901759 -- 8< ------------------------------------------------- The actual config values: -- 8< ------------------------------------------------- MariaDB [(none)]> show status like 'wsrep%'; +------------------------------+--------------------------------------+ | Variable_name | Value | +------------------------------+--------------------------------------+ | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 0.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 0.000000 | | wsrep_cert_index_size | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 12 | | wsrep_cluster_size | 2 | | wsrep_cluster_state_uuid | 9fd9b841-c299-11e8-bb45-6a5bd809fa6b | | wsrep_cluster_status | Primary | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 0.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | 7e891008-c8ba-11e8-b09e-da972ea198ae | | wsrep_incoming_addresses | 10.210.31.20:3306,10.210.31.48:3306 | | wsrep_last_committed | 24189 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 18446744073709551615 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 0 | | wsrep_local_index | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.500000 | | wsrep_local_send_queue_max | 2 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | 9fd9b841-c299-11e8-bb45-6a5bd809fa6b | | wsrep_protocol_version | 8 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.23(r3789) | | wsrep_ready | ON | | wsrep_received | 2 | | wsrep_received_bytes | 232 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_thread_count | 2 | +------------------------------+--------------------------------------+ > + > The value of: > binlog_format > default_storage_engine > innodb_autoinc_lock_mode > innodb_doublewrite > wsrep_on > innodb_flush_log_at_trx_commit +--------------------------------+--------+ | Variable_name | Value | +--------------------------------+--------+ | binlog_format | ROW | | default_storage_engine | InnoDB | | innodb_autoinc_lock_mode | 2 | | innodb_doublewrite | ON | | innodb_flush_log_at_trx_commit | 1 | | wsrep_on | ON | +--------------------------------+--------+ > Also not sure if you can retrieve the MariaDB logs (regular and/or error > ones) to see the specific message when it failed, like: > > [ERROR] WSREP: Maximum wirteset size exceeded by XXXX: 90 (Message too long) The only thing it logged was this: 2018-09-28 12:17:46 140549635750656 [Warning] WSREP: transaction size limit (2147483647) exceeded: 2147483648 2018-09-28 12:17:46 140549635750656 [ERROR] WSREP: rbr write fail, data_len: 0, 2 > or similar that could point to anything relevant. In any case, this looks > more like a DB Admin / Galera issue than a Slurm one. Agreed. Although there may be way to reduce the size of the transactions during the DB upgrade to avoid triggering this limit? Cheers, -- Kilian (In reply to Kilian Cavalotti from comment #40) > Agreed. Although there may be way to reduce the size of the transactions > during the DB upgrade to avoid triggering this limit? I've been able to reproduce with Galera setup and similar options to yours: alex@knl2:~/slurm/upgrade/knl2/log$ tail -f slurmdbd.log ... [2018-10-08T09:01:41.677] error: mysql_query failed: 1180 Got error 90 "Message too long" during COMMIT create table if not exists table_defs_table (creation_time int unsigned not null, mod_time int unsigned default 0 not null, table_name text not null, definition text not null, primary key (table_name(50))) engine='innodb' [2018-10-08T09:01:41.677] Accounting storage MYSQL plugin failed ... But I don't see these logged in my mariadb logs: 2018-09-28 12:17:46 140549635750656 [Warning] WSREP: transaction size limit (2147483647) exceeded: 2147483648 2018-09-28 12:17:46 140549635750656 [ERROR] WSREP: rbr write fail, data_len: 0, 2 as you reported. I'm now trying to upgrade again for Nth time by increasing wsrep_max_ws_size to 8GB. Even if the documentation states that the default and maximum value is 2GB, according to this https://www.percona.com/blog/2015/10/26/how-big-can-your-galera-transactions-be/ "You can actually set this higher than 2GB, but only the 2GB is being enforced in my tests." So I'm trying to see if that makes a difference. With regards to your question about reducing the size of the transactions... I wouldn't mind exploring that option in the Slurm code, but I find it strange the error seems to be logged after executing a create table query: create table if not exists table_defs_table ... MariaDB [slurm_acct_db_5662]> describe table_defs_table; +---------------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------------+------------------+------+-----+---------+-------+ | creation_time | int(10) unsigned | NO | | NULL | | | mod_time | int(10) unsigned | NO | | 0 | | | table_name | text | NO | PRI | NULL | | | definition | text | NO | | NULL | | +---------------+------------------+------+-----+---------+-------+ 4 rows in set (0.01 sec) MariaDB [slurm_acct_db_5662]> Wondering if the 4th Field, 'definition' of Type text which contains the definition of the tables is what it causing any limit being hit. So not sure which transaction sizes we should be attempting to reduce in size... In any case, I think if you manage to upgrade without Galera we'll go ahead and close the bug. After attempting upgrade with wsrep_max_ws_size set to 8GB I got the error again. I'd propose using the workaround upgrading with Galera disabled. Please, let us know if you encounter problems with that. Hi Kilian, did you manage to upgrade to 18.08 on production? thanks. Hi Alejandro, (In reply to Alejandro Sanchez from comment #43) > Hi Kilian, did you manage to upgrade to 18.08 on production? thanks. Thanks for checking. We haven't done the production upgrade yet, we'll need to schedule a downtime and align it with other maintenance operations, so that requires some coordination and will take some time. But we'll definitely let you know when it's done! Cheers, -- Kilian Just to note some potentially useful notes. Another site upgraded yesterday with a db containing 73M steps. It looks like the my.cnf tmpdir temporarily used 46G of space for altering tables. Once that settled, the server actively worked on the datadir copy. The overall conversion took 5.5hours, thanks to a disk in a software raid 1. Without such setup, a parallel conversion with same database took 11.5hours (2x slower). As you may guess, upgrade time is highly dependant on job/step records and underlying hardware/setup. Hi, Just a quick note to let you know that we successfully converted our SlurmDBD databases to 18.08 format (finally!), after having disabled Galera. We're now running 18.08.3, so we can close this bug. Cheers, -- Kilian (In reply to Kilian Cavalotti from comment #46) > Hi, > > Just a quick note to let you know that we successfully converted our > SlurmDBD databases to 18.08 format (finally!), after having disabled Galera. > > We're now running 18.08.3, so we can close this bug. > > Cheers, > -- > Kilian Ahh.. glad to hear. That was why I could corroborate with your database copy. I'm currently on a training this week. As soon as I get back to my regular schedule I'll put the focus on the purge timeout issue. Thanks for your feedback. |
Hi there, Still in the process of converting our 17.11.8 Slurm DB to the 18.08.0 format. It's been happily chugging along for about 2 hours, adding columns to the _step_table table, and then populating them, but then it stopped with this message: Aug 31 17:28:42 sh-sl01.int slurmdbd[170926]: error: mysql_query failed: 1180 Got error 90 "Message too long" during COMMIT create table if not exists table_defs_table (creation_time int unsigned not null, mod_time int unsigned default 0 not null, table_name text not null, definition text not null, primary key (table_name(50))) engine='innodb' I tried to run that query by hand, and it worked fine and created the table correctly. Any idea what could have caused that issue? The frustrating thing is that restarting slurmdbd now re-update the _step_table, which in our case holds close to 25 million jobs, so that takes some time... Thanks, -- Kilian