4877 – Long database upgrade when moving from 16.05.9 to 17.11.4

Ticket 4877 - Long database upgrade when moving from 16.05.9 to 17.11.4

Summary: Long database upgrade when moving from 16.05.9 to 17.11.4

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Database (show other tickets)
Version:	17.11.4
Hardware:	Linux Linux

Severity:	2 - High Impact
Assignee:	Brian Christiansen
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2018-03-06 08:27 MST by Jared D. Baker
Modified:	2018-03-09 10:55 MST (History)
CC List:	0 users

See Also:
Site:	University of Wyoming
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
slurmdbd.log (4.28 KB, text/plain) 2018-03-06 08:27 MST, Jared D. Baker	Details
innodb status (14.70 KB, text/plain) 2018-03-08 13:31 MST, Jared D. Baker	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Jared D. Baker 2018-03-06 08:27:46 MST

Created attachment 6303 [details]
slurmdbd.log

We'd been aprehensive about upgrading the database with the conversations on the slurm-users mailing list, but we had believed our database was fairly small and slurmdbd on a real machine with plenty of hardware cycles to spare, plenty of memory, etc. However, we started at 1:45 PM MST yesterday and at 8:00 AM MST next morning, the database upgrade is still running. Output of the log is attached. The SQL process that seems to be stuck is as follows:

mysql> show full processlist;
+--------+-----------+-----------------------+---------------+---------+-------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Id     | User      | Host                  | db            | Command | Time  | State        | Info                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+--------+-----------+-----------------------+---------------+---------+-------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 443756 | slurmacct | mmmmgt1.cluster:39050 | slurm_acct_db | Query   | 65729 | Sending data | update "mtmoran_job_table" as job left outer join ( select job_db_inx, SUM(consumed_energy) 'sum_energy' from "mtmoran_step_table" where id_step >= 0 and consumed_energy != 18446744073709551614 group by job_db_inx ) step on job.job_db_inx=step.job_db_inx set job.tres_alloc=concat(job.tres_alloc, concat(',3=', case when step.sum_energy then step.sum_energy else 18446744073709551614 END)) where job.tres_alloc != '' && job.tres_alloc not like '%,3=%' |
| 444263 | root      | localhost             | NULL          | Query   |     0 | NULL         | show full processlist                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+--------+-----------+-----------------------+---------------+---------+-------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Comment 1 Brian Christiansen 2018-03-06 09:20:23 MST

Hey Jared,

Our recommendation is to let it keep going for now. We've seen that the rollback will take a long time if you stop it. That sql statement is expected.

A few other questions:

1. How many job records are in the database.
2. How many step records are in the database.
3. What type of hard drive are you using.
4. What type of cpu do you have?
5. Does top show that mysql is working?

Thanks,
Brian

Comment 2 Jared D. Baker 2018-03-06 10:06:48 MST

we have 1.4M job records, 2.2M step records. 

Dedicated HDD mirror, dual socket 2.6GHz Xeon Sandy Bridge processors, and 64 GB memory.

Top shows mysqld running at about 99% of a single cpu.

The goal here is to pre-migrate (upgrade) the database before moving it to new hardware.

Comment 3 Brian Christiansen 2018-03-06 10:33:27 MST

Just for FYI:
"As a reference point, one of our databases with 3M jobs on an nvme took 12 minutes to finish the upgrade (Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz). The exact same database on a different hardware took around 45 minutes, so hardware matters a lot. Another reference, a customer with 5M records took about 2 hours."(Bug 4846)

Also, what version of mysql/mariadb are you using?

It's good to see that the mysqld is busy.

Comment 4 Jared D. Baker 2018-03-06 11:01:25 MST

We are using mysql-5.1.73-8.el6_8.x86_64

In process to move to mariadb on RHEL 7.4 here in the next week or so.

I'm assuming that the preferred version is the one delivered by Red Hat?

Comment 5 Brian Christiansen 2018-03-06 11:40:32 MST

brian@lappy:~$ docker run centos:7 yum -q info mariadb | grep Version
Version     : 5.5.56

is fine. In reference to the 3M records in 12 minutes instance, the system is using MariabDB 10.2. It's unfounded, but we wonder if that contributed to the speediness of the upgrade as well.

The best thing to do is to wait for the dbd to finish the upgrade. Let us know when it finishes. 

The one concern with the upgrade taking a long time is the records intended for the dbd building up. Do you see any of these messages in the slurmctld logs?

error("slurmdbd: agent queue filling (%d), RESTART SLURMDBD NOW",
		      cnt);
error("slurmdbd: agent queue is full (%u), discarding %s:%u request",
		      cnt,

The slurmctld will hold on to the records until the slurmdbd is backup. It'll hold up to:

MAX(10000, ((MaxJobCount * 2) + (node_record_count * 4)));

before it start's discarding records. When the queue is more half the queue size, the "RESTART SLURMDBD NOW" message will appear. If needed, MaxJobCount could be increased to give it more buffer.

Comment 6 Jared D. Baker 2018-03-07 09:16:10 MST

I don't see any of those messages in the logs, but the upgrade is still attempting to happen on the same process command.

Comment 7 Brian Christiansen 2018-03-07 09:28:35 MST

Good to hear that you aren't seeing those messages.

Do you happen to have a backup of the database? Would you be willing to send it to me so I can try it locally? I can make the bug private or you can email it to me directly if you like. 

I created a database yesterday with 1 million job records with 2 steps each and the conversion took 96 seconds. I'm attempting the conversion on a couple of other machines too.

Comment 8 Brian Christiansen 2018-03-07 12:50:25 MST

Here's some results from using your database.

mysql 5.7.21

CPU:  i7-7700HQ CPU @ 2.80GHz
Disk: nvme
Time: 195518083 usecs

CPU:  i7-2600K CPU @ 3.40GHz
Disk: ssd
Time: 402331241 usecs

CPU:  i7-2600K CPU @ 3.40GHz
Disk: hdd (database symlinked to filesystem on hdd)
Time: 402331241 usecs

I uploaded (drive) the converted database if you want to look at it.

Maybe the mirror is causing it to be slow?

Comment 9 Jared D. Baker 2018-03-08 08:09:21 MST

This is still going. Is there a parameter within slurm.conf that is adversely affecting this? The mailing list referred to AccountingStorageTRES which in our environment is set to cpu,mem,energy,node (default). We don't really care about the energy consumption yet, but if that's the element in question, then should it be removed from the default if an update takes this long?

Comment 10 Brian Christiansen 2018-03-08 10:56:24 MST

You actually can't remove the energy TRES -- it's on no matter what (Billing, CPU, Energy, Memory, and Node are always tracked).

Some background: 
Prior to 17.11 the slurmdbd rollups and sacct would get the energy from the steps with a join on the job and step tables but we found this to be pretty expensive on large tables. Now in 17.11, the energy is summed up at the end of the job and recorded on the job record. Now the database doesn't need to look at the steps for energy. It's big gain in performance after the upgrade, but the upgrade is being troublesome for a few systems. I'm still investigating what's causing the slowness. My next thing to try is using MySQL 5.1.


If you're willing to do it, you can use the upgrade database that I uploaded --
 I just uploaded a new one (wyoming2.sql.gz) as the first one might have done some purging.

Steps:
1. Stop the dbd.
2. kill the sql query (ID = 443756?)
3. Stop the mysqld -- to stop the rollback
4. drop the current database (or create a new one next to it)
5. create database
6. import the database
7. start the slurmdbd.

Let me know if you do this or not.

Thanks,
Brian



Here's some output from my import of your database:
Before upgrade:
mysql> select count(*) from mtmoran_job_table;
+----------+
| count(*) |
+----------+
|  1417646 |
+----------+
1 row in set (0.21 sec)

mysql> select count(*) from mtmoran_step_table;
+----------+
| count(*) |
+----------+
|  2142859 |
+----------+
1 row in set (0.62 sec)


mysql> select tres_alloc from mtmoran_job_table order by job_db_inx limit 1;
+------------+
| tres_alloc |
+------------+
| 1=0        |
+------------+
1 row in set (0.00 sec)


After upgrade:
mysql> select count(*) from mtmoran_job_table;
+----------+
| count(*) |
+----------+
|  1417646 |
+----------+
1 row in set (0.19 sec)

mysql> select count(*) from mtmoran_step_table;
+----------+
| count(*) |
+----------+
|  2142859 |
+----------+
1 row in set (0.66 sec)

mysql> select tres_alloc from mtmoran_job_table order by job_db_inx limit 1;
+----------------------------+
| tres_alloc                 |
+----------------------------+
| 1=0,3=18446744073709551614 |
+----------------------------+
1 row in set (0.00 sec)

Comment 11 Brian Christiansen 2018-03-08 11:51:22 MST

I'm currently upgrading your database on a 5.1 mysql server and it's been running for 25+ minutes and is still not done. This is on my system with the nvme. I also started an upgrade using an hdd as well. I'll let you know when it finishes. But I'm thinking that 5.1 is the culprit.

Comment 12 Brian Christiansen 2018-03-08 12:02:44 MST

If that query is still running, will you grab the output of?

show engine innodb status;

Comment 13 Jared D. Baker 2018-03-08 13:31:41 MST

Created attachment 6329 [details]
innodb status

Comment 14 Brian Christiansen 2018-03-08 16:57:18 MST

I was curious what your buffer pool size looked like. It looks pretty small. My 5.1 version was set to the same as well.

Buffer pool size   512
Free buffers       0
Database pages     200

The 5.1 upgrade was still happening (after ~5 hours), so I stopped it and increased the buffer pool size to 8G in hopes that the upgrade would happen a lot quicker. While I'm guessing it will still help, the upgrade is still running after 40 minutes.

e.g.
Buffer pool size   524288
Free buffers       370491
Database pages     152798

mysql> show variables like 'innodb_buffer_pool_%';
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| innodb_buffer_pool_size | 8589934592 |
+-------------------------+------------+
1 row in set (0.01 sec)


My 5.7 MySQL buffer_pool_size is actually a lot smaller and still out performs 5.1:
mysql> show variables like 'innodb_buffer_pool_size';
+-------------------------+-----------+
| Variable_name           | Value     |
+-------------------------+-----------+
| innodb_buffer_pool_size | 134217728 |
+-------------------------+-----------+
1 row in set (0.02 sec)

Comment 15 Brian Christiansen 2018-03-08 16:57:31 MST

Have you considered using the database I sent you? The main concern is losing records in the slurmctld while the upgrade is still happening. Just keep watching for the agent queue messages.

Comment 16 Jared D. Baker 2018-03-08 19:01:58 MST

We had thought about it. Since you mentioned MySQL 5.1 issues, we upgraded to MySQL 5.7 later this afternoon, dropped the slurm accounting database, restored from our earlier sqldump and then started the upgrade process again. It took ~16 minutes to complete and we're back to fully functional state.

Comment 17 Brian Christiansen 2018-03-08 19:38:20 MST

Awesome! Glad you were able to get it up and going. Thanks for your help and patience. Let us know if you need anything else.

Comment 18 Brian Christiansen 2018-03-09 10:55:46 MST

Thanks again for your help. I'm going to mark this as resolved. Let us know if anything else comes up.

FYI. My 5.1 databases are still upgrading.