Ticket 16575

Summary: Slurmdbd + mariadb upgrade assistance
Product: Slurm Reporter: Bjørn-Helge Mevik <b.h.mevik>
Component: DatabaseAssignee: Oscar Hernández <oscar.hernandez>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: mcmullan, oscar.hernandez
Version: 22.05.8   
Hardware: Linux   
OS: Linux   
Site: Sigma2 Norway Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Bjørn-Helge Mevik 2023-04-23 11:44:14 MDT
Hello,

We are in the process of upgrading Slurm from 21.08.7 to 22.05.8, *and* upgrading our OS from CentOS 7.9 to Rocky 9.1, which means going from Mariadb 5.5 to 10.5.

We have actually upgraded the OS already, and installed and started Slurm 22.05.8, but we haven't gone into production yet (i.e., not started running jobs).

I have two questions:


A)

When slurmdbd did its update of the db tables on the initial start of 22.05.8, we noticed that the size of the db (/var/lib/mysql) grew from 19 GiB to 33 GiB.  Is that to be expected, and/or is there anything we can do to reduce the size, especially since the following suggests less space should be needed?

MariaDB [(none)]> SELECT table_schema "DB", sum( data_length + index_length ) / 1024 / 1024 "Size (MB)" FROM information_schema.TABLES GROUP BY table_schema;
 
+--------------------+----------------+
| DB                 | Size (MB)      |
+--------------------+----------------+
| information_schema |     0.20312500 |
| slurm_saga         | 13547.37500000 |
+--------------------+----------------+
2 rows in set (0.005 sec)


B)

While preparing this ticket, I discovered the warning in the docs about upgrading Mariadb from < 10.2.1 to >= 10.2.1.  Unfortunately, I didn't see (or rather: remember) that before we did the upgrade.

The procedure we used for upgrading is:

1) Stop slurm 21.08
2) Take a myqsldump of the slurm_saga db (just in case using old /var/lib/mysql directly didn't work)
3) Stop mariadb 5.5
4) Make a backup of /var/lib/mysql
5) Reinstall machine with Rocky 9.1
6) Stop the newly installed mariadb 10.5
7) Restore /var/lib/mysql from backup
8) Start mariadb 10.5 and run "mariadb-upgrade"
9) Build and install slurm 22.05
10) Start slurmdbd manually and wait for it to finish the table structure upgrade
11) Restart slurmdbd (and start slurmctld) with systemctl

What do you recommend we do to fix the problems alluded to in the warning (and other tickets I've seen about this mariadb update)?  Do we need to go back to mariadb 5.5 and fix things there before going to 10.5, or is it possible to handle it with 10.5?  We have not started running jobs yet.

(We still have the backed up /var/lib/mysql and the mysqldump, and we could install a VM with centos 7.9 and mariadb 5.5 for doing fixed, but that would take a bit more time.)

Regards,
Bjørn-Helge Mevik
Comment 5 Oscar Hernández 2023-04-25 03:38:24 MDT
Dear Bjørn-Helge,

Let me first answer question B. 

Fortunately, the warning you are referring to was addressed in the release of 22.05.7 (commit 8c2ead9). Since you upgraded to 22.05.8, you are covered here, and I would not expect any problem. You did well on asking about that.

In any case, if you are curious, the issues that this MariaBD problems triggered are discussed in bug 13562.

Then, about your first question (B).

We do suspect it might due to some fragmentation occurred in the transition from MariaDB 5.5 to 10.5. Since you directly restored the /var/lib/mysql into the new MariaDB version, you might be carrying over some drawbacks from the older MariaDB.

To try to improve in space usage, we would suggest to re-create databases from scratch in version 10.5. To do so:

1 - Stop slurmdbd.
2 - Dump the Slurm database with Mysqldump.
(You could backup the database if you consider it necessary, but I guess you still have the previous backup)
3 - Drop the database.
4 - Import the database from the Mysqldump in step 2.
5 - Start slurmdbd again.

These database creation from scratch, should only take the necessary space to allocate the data, as well as benefit from the new defaults from MariaDB 10.5. 

Also, when searching around this topic, a colleague suggested to take a look at:

https://stackoverflow.com/questions/3456159/how-to-shrink-purge-ibdata1-file-in-mysql

It could be possible that Ibdata1 file is taking much space. Since you are on a fresh install, you might also be interested in doing what is suggested in the link above: Delete ibdata1 and ib_logfile0 before creating the database again. 

If following my suggested steps above: step 2 should dump all databases you might be interested (in case you have others than the slurm one) and step 3 should drop all databases except "mysql" and "performance_schema databases" as well as include the deletion of /var/lib/mysql/ibdata1 and /var/lib/mysql/ib_logfile* files.

Hope that helps, let us know how things go if you decide to apply the suggested changes,

Kind regards,
Oscar
Comment 6 Bjørn-Helge Mevik 2023-04-25 13:02:13 MDT
(In reply to Oscar Hernández from comment #5)

Dear Oscar,

> Fortunately, the warning you are referring to was addressed in the release
> of 22.05.7 (commit 8c2ead9). Since you upgraded to 22.05.8, you are covered
> here, and I would not expect any problem.

Very good! :) Thanks!

> Then, about your first question (B).
[...]
> Hope that helps, let us know how things go if you decide to apply the
> suggested changes,

I followed the procedure - dumped the db, removed the files and loaded the db.  It worked fine, and now the size is just 17 GiB.  :)

Again, thanks!

Bjørn-Helge
Comment 7 Oscar Hernández 2023-04-27 09:08:02 MDT
You are welcome!
Glad that worked out. Closing as INFOGIVEN.