Ticket 14039

Summary: Upgrade MariaDB 5.5 to 10.3
Product: Slurm Reporter: ARC Admins <arc-slurm-admins>
Component: DocumentationAssignee: Chad Vizino <chad>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 21.08.7   
Hardware: Linux   
OS: Linux   
Site: University of Michigan Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: RHEL Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description ARC Admins 2022-05-10 11:30:29 MDT
Hello,

Part of an OS upgrade project I am working on has the MariaDB instance upgrade from 5.5.68 to 10.3.28.
  
In the documentation you have this node: (https://slurm.schedmd.com/accounting.html)

> NOTE: If you have an existing Slurm accounting database and plan to upgrade your database server to MariaDB 10.2.1 (or newer) from a pre-10.2.1 version or from any version of MySQL, please contact SchedMD for assistance.


I had already stood up an instance of the new slurmdb and did the following steps that seemed to work but from the msg in the docs it seems that there might be more to it.
The steps taken:
1. Install new slurmdbd instance 21.08.7 with MariaDB 10.3.28
2. Once service is up and running on a clean install stop slurmdbd
3. Drop the slurm_acct_db db from MariaDB
4. Recreate the DB slurm_acct_db empty
5. Restore from backup with `mysql slurm_acct_db < slurmdbd_backup`

Please advise if the steps taken are correct or what needs to be considered since the NOTE is a bit ominous.

Thanks
Vasile
Comment 2 Chad Vizino 2022-05-10 12:21:37 MDT
(In reply to ARC Admins from comment #0)
> Part of an OS upgrade project I am working on has the MariaDB instance
> upgrade from 5.5.68 to 10.3.28.
>   
> In the documentation you have this node:
> (https://slurm.schedmd.com/accounting.html)
> 
> > NOTE: If you have an existing Slurm accounting database and plan to upgrade your database server to MariaDB 10.2.1 (or newer) from a pre-10.2.1 version or from any version of MySQL, please contact SchedMD for assistance.
> 
> 
> I had already stood up an instance of the new slurmdb and did the following
> steps that seemed to work but from the msg in the docs it seems that there
> might be more to it.
> The steps taken:
> 1. Install new slurmdbd instance 21.08.7 with MariaDB 10.3.28
> 2. Once service is up and running on a clean install stop slurmdbd
> 3. Drop the slurm_acct_db db from MariaDB
> 4. Recreate the DB slurm_acct_db empty
> 5. Restore from backup with `mysql slurm_acct_db < slurmdbd_backup`
> 
> Please advise if the steps taken are correct or what needs to be considered
> since the NOTE is a bit ominous.
Hi. The background for this is:

>As of MariaDB 10.2.1 BLOB and TEXT fields can now have a DEFAULT value.
>When an existing Slurm database exists, this change can lead to problems
>when upgrading from an older version or from any version of MySQL.
One of the "problems" alluded to is when sacctmgr is used to delete a user and a text field is assigned 'DEFAULT' via an underlying sql query to a >= 10.2.1 MariaDB db server using a slurm_acct_db from a < 10.2.1 version MariaDB dump (or any version of MySQL). In this case the saccmgr delete user will fail. Altering affected fields in slurm_acct_db under MariaDB >= 10.2.1 will prevent this and can be accomplished by the steps below.

Follow your 5-step plan above with these additions:

6. Drop the table 'table_defs_table' from slurm_acct_db
7. Restart slurmdbd (and then the other slurm daemons)

This will force slurmdbd to alter the tables to conform to the newer MariaDB change per above. We're working on code to do this automatically but it's not ready yet, thus the warning in the docs.

Feel free to ask any questions if you have them.
Comment 3 ARC Admins 2022-05-10 14:43:29 MDT
> Follow your 5-step plan above with these additions:
> 
> 6. Drop the table 'table_defs_table' from slurm_acct_db
> 7. Restart slurmdbd (and then the other slurm daemons)
> 
> This will force slurmdbd to alter the tables to conform to the newer MariaDB
> change per above. We're working on code to do this automatically but it's
> not ready yet, thus the warning in the docs.
> 
> Feel free to ask any questions if you have them.

Thanks for the fast response.

I applied the 2 steps you provided to an instanced I had already had running. Should I have seen something in logs? (output posted below) or does this need to be done before slurmdbd is started for the first time after the restore is done?

Slurmdbd logs after dropping the `table_defs_table` from slurm_accountdb and starting slurmdbd:

[2022-05-10T15:53:40.265] accounting_storage/as_mysql: _check_mysql_concat_is_sane: MySQL server version is: 10.3.28-MariaDB
[2022-05-10T15:53:40.339] accounting_storage/as_mysql: init: Accounting storage MYSQL plugin loaded
[2022-05-10T15:53:54.808] slurmdbd version 21.08.7 started
[2022-05-10T15:53:54.810] accounting_storage/as_mysql: as_mysql_hourly_rollup: DB_USAGE: 0(as_mysql_rollup.c:1301) query
select dimensions from cluster_table where name='lighthouse'
[2022-05-10T15:53:54.810] accounting_storage/as_mysql: as_mysql_hourly_rollup: DB_USAGE: 0(as_mysql_rollup.c:1334) lighthouse curr hour is now 1652205600-1652209200
[2022-05-10T15:53:54.810] accounting_storage/as_mysql: as_mysql_hourly_rollup: DB_USAGE: 0(as_mysql_rollup.c:1301) query
select dimensions from cluster_table where name='greatlakes'

There are more lines but it looked like regular business.  What should I be seeing in logs to know it did the right thing? 

Thanks
Vasile
Comment 4 Chad Vizino 2022-05-10 15:00:37 MDT
(In reply to ARC Admins from comment #3)
> I applied the 2 steps you provided to an instanced I had already had
> running. Should I have seen something in logs? (output posted below) or does
> this need to be done before slurmdbd is started for the first time after the
> restore is done?
The "drop table" needs to be done before you start running jobs. But you will need to restart slurmdbd after dropping the table. I address the log question below.

> Slurmdbd logs after dropping the `table_defs_table` from slurm_accountdb and
> starting slurmdbd:
> 
> [2022-05-10T15:53:40.265] accounting_storage/as_mysql:
> _check_mysql_concat_is_sane: MySQL server version is: 10.3.28-MariaDB
> [2022-05-10T15:53:40.339] accounting_storage/as_mysql: init: Accounting
> storage MYSQL plugin loaded
> [2022-05-10T15:53:54.808] slurmdbd version 21.08.7 started
> [2022-05-10T15:53:54.810] accounting_storage/as_mysql:
> as_mysql_hourly_rollup: DB_USAGE: 0(as_mysql_rollup.c:1301) query
> select dimensions from cluster_table where name='lighthouse'
> [2022-05-10T15:53:54.810] accounting_storage/as_mysql:
> as_mysql_hourly_rollup: DB_USAGE: 0(as_mysql_rollup.c:1334) lighthouse curr
> hour is now 1652205600-1652209200
> [2022-05-10T15:53:54.810] accounting_storage/as_mysql:
> as_mysql_hourly_rollup: DB_USAGE: 0(as_mysql_rollup.c:1301) query
> select dimensions from cluster_table where name='greatlakes'
> 
> There are more lines but it looked like regular business.  What should I be
> seeing in logs to know it did the right thing? 
There's nothing you'll see logged unless you modify the source (or view the query logs from your db server). However, you can confirm that the right thing has been done by running this query:

>mysql -D slurm_acct_db -e "desc txn_table"
You should see '' (empty string) in the Default column for the field cluster:

>+-----------+---------------------+------+-----+---------+----------------+
>| Field     | Type                | Null | Key | Default | Extra          |
>+-----------+---------------------+------+-----+---------+----------------+
> ...
>| cluster   | tinytext            | NO   |     | ''      |                |
If you have not dropped table_defs_table and restarted slurmdbd, you will see NULL listed under the Default column for this field.

There are a number of other similar fields of type *text or blob in other tables that will have a similar transformation but this is a good way to spot check the operation.

One final note: This operation (dropping the table and restarting slurmdbd) only needs to be done once for the MariaDB upgrade.
Comment 5 ARC Admins 2022-05-11 09:27:14 MDT
Thanks, I just checked and cluster is set to ''

+-----------+---------------------+------+-----+---------+----------------+
| Field     | Type                | Null | Key | Default | Extra          |
+-----------+---------------------+------+-----+---------+----------------+
| cluster   | tinytext            | NO   |     | ''      |                |


That resolves this for me, thanks again.
Vasile