Hey we have the same problem as described in issue: * https://bugs.schedmd.com/show_bug.cgi?id=12558 Is there a fix for this problem? ``` [2021-12-02T11:38:17.066] error: CONN:8 No error [2021-12-02T11:40:13.961] error: mysql_query failed: 1364 Field 'max_tres_pj' doesn't have a default value update "lisa_assoc_table" as t1 set mod_time=1638441610, deleted=1, def_qos_id=DEFAULT, shares=DEFAULT, max_jobs=DEFAULT, max_jobs_accrue=DEFAULT, min_prio_thresh=DEFAULT, max_submit_jobs=DEFAULT, max_wall_pj=DEFAULT, max_tres_pj=DEFAULT, max_tres_pn=DEFAULT, max_tres_mins_pj=DEFAULT, max_tres_run_mins=DEFAULT, grp_jobs=DEFAULT, grp_submit_jobs=DEFAULT, grp_jobs_accrue=DEFAULT, grp_wall=DEFAULT, grp_tres=DEFAULT, grp_tres_mins=DEFAULT, grp_tres_run_mins=DEFAULT, qos=DEFAULT, delta_qos=DEFAULT, priority=DEFAULT where (id_assoc=46778 || id_assoc=46779 || id_assoc=46780 || id_assoc=46781 || id_assoc=46782 || id_assoc=46783 || id_assoc=46784); [2021-12-02T11:40:15.467] error: CONN:8 No error ``` The deleting of the accounts/users is piling up!
And can you confirm that this is the fix / workaound: * https://bugs.schedmd.com/show_bug.cgi?id=12558#c3
Would you be able to provide a database dump without jobs and job steps and attach that information to this bug for us to review? For example: > mysqldump slurm_acct_db --ignore-table=slurm_acct_db.clustername_job_table --ignore-table=slurm_acct_db.clustername_step_table | xz - > db_dump.xz Please modify "slurm_acct_db.clustername_step_table" to reflect your database.ClusterName_ ... What we will be looking at are if the left and right values mentioned here. https://slurm.schedmd.com/slurmdbd.html#OPT_-R[comma-separated-cluster-name-list]
I failed to reply to your second commend. Please do still provide the database dump for us to review. > And can you confirm that this is the fix / workaound: > * https://bugs.schedmd.com/show_bug.cgi?id=12558#c3 It is unclear at this point, however we have seen a few issues like this in the past. We have not been able to duplicate this (NULL/DEFAULT). We think this has something to do with the database/version exposed from upgrading OS components, specifically the database.
Created attachment 22496 [details] lisa cluster dump
(In reply to Bas van der Vlies from comment #0) > Hey we have the same problem as described in issue: > * https://bugs.schedmd.com/show_bug.cgi?id=12558 > > Is there a fix for this problem? > > ``` > [2021-12-02T11:38:17.066] error: CONN:8 No error > [2021-12-02T11:40:13.961] error: mysql_query failed: 1364 Field > 'max_tres_pj' doesn't have a default value > update "lisa_assoc_table" as t1 set mod_time=1638441610, deleted=1, > def_qos_id=DEFAULT, shares=DEFAULT, max_jobs=DEFAULT, > max_jobs_accrue=DEFAULT, min_prio_thresh=DEFAULT, max_submit_jobs=DEFAULT, > max_wall_pj=DEFAULT, max_tres_pj=DEFAULT, max_tres_pn=DEFAULT, > max_tres_mins_pj=DEFAULT, max_tres_run_mins=DEFAULT, grp_jobs=DEFAULT, > grp_submit_jobs=DEFAULT, grp_jobs_accrue=DEFAULT, grp_wall=DEFAULT, > grp_tres=DEFAULT, grp_tres_mins=DEFAULT, grp_tres_run_mins=DEFAULT, > qos=DEFAULT, delta_qos=DEFAULT, priority=DEFAULT where (id_assoc=46778 || > id_assoc=46779 || id_assoc=46780 || id_assoc=46781 || id_assoc=46782 || > id_assoc=46783 || id_assoc=46784); > [2021-12-02T11:40:15.467] error: CONN:8 No error > ``` > > The deleting of the accounts/users is piling up! This has to do with the db engine you are using. From the dump it looks like you are using MariaDB 10.3.31. Was it the same one you used when you first installed slurm (when the db tables were created)? For MariaDB the related doc at https://mariadb.com/kb/en/text/ says: >Before MariaDB 10.2.1, BLOB and TEXT columns could not be assigned a DEFAULT value. This restriction was lifted in MariaDB 10.2.1. For another customer who had this issue (before the bug you reference) we suggested adding defaults to the cluster assoc table to resolve this. For your cluster you can try this after after stopping the slurm daemons, running the commands and restarting things: >ALTER TABLE lisa_assoc_table MODIFY max_tres_pj text DEFAULT ''; >ALTER TABLE lisa_assoc_table MODIFY max_tres_pn text DEFAULT ''; >ALTER TABLE lisa_assoc_table MODIFY max_tres_mins_pj text DEFAULT ''; >ALTER TABLE lisa_assoc_table MODIFY max_tres_run_mins text DEFAULT ''; >ALTER TABLE lisa_assoc_table MODIFY grp_tres text DEFAULT ''; >ALTER TABLE lisa_assoc_table MODIFY grp_tres_mins text DEFAULT ''; >ALTER TABLE lisa_assoc_table MODIFY grp_tres_run_mins text DEFAULT ''; >ALTER TABLE lisa_assoc_table MODIFY qos blob DEFAULT ''; >ALTER TABLE lisa_assoc_table MODIFY delta_qos blob DEFAULT '';
Thanks we will try it. I had to search: * We started with slurm 17 and debian 9 * This upgrade started with debian 10.10 --> mariadb 10.3.29 * upgraded from slurm 20.02 --> 20.11.8 * upgraded debian 10.11 --> mariadb 10.3.31 So the mariadb version does not differ that much. That is why we are suprised.
Thanks Jason/Chad the alter commands works as expected. Just one question if we upgrade to the next major (21.08.4) and the table `cluster_assoc_table` has been changed in that release a field is added or removed do we have the same problem and must we alter the table again?
Another question to this is the 'update' sql command of the assoc done within slurm changed with the upgrade of slurm?
(In reply to Bas van der Vlies from comment #8) > Thanks Jason/Chad the alter commands works as expected. > > Just one question if we upgrade to the next major (21.08.4) and the table > `cluster_assoc_table` has been changed in that release a field is added or > removed do we have the same problem and must we alter the table again? Great! You should not need to alter the table again if you're keeping the db (you do not drop/create the table). If you run: >mysqldump slurm_acct_db lisa_assoc_table You should now see the the text/blob fields with default values similar to this: >CREATE TABLE `cluster_assoc_table` ( > ... > `user` tinytext NOT NULL DEFAULT '', > ...
(In reply to jaap.dijkshoorn@surf.nl from comment #9) > Another question to this is the 'update' sql command of the assoc done > within slurm changed with the upgrade of slurm? Sorry. I don't understand what you are asking. would you elaborate?
(In reply to Chad Vizino from comment #12) > (In reply to jaap.dijkshoorn@surf.nl from comment #9) > > Another question to this is the 'update' sql command of the assoc done > > within slurm changed with the upgrade of slurm? > > Sorry. I don't understand what you are asking. would you elaborate? We want to understand how we end up hitting this 'bug' at this point. Was it the way the tables were created in the past by SLURM in mysql (mariadb) and that deleting stuff in this table is now handled slightly different then before? or something else? Cause the update of Mariadb at this stage was minor. (from 10.3.29 to 10.3.31)
(In reply to jaap.dijkshoorn@surf.nl from comment #14) > We want to understand how we end up hitting this 'bug' at this point. Was it > the way the tables were created in the past by SLURM in mysql (mariadb) and > that deleting stuff in this table is now handled slightly different then > before? or something else? Cause the update of Mariadb at this stage was > minor. (from 10.3.29 to 10.3.31) About the failing query you list in comment 0: >[2021-12-02T11:40:13.961] error: mysql_query failed: 1364 Field 'max_tres_pj' doesn't have a default value >update "lisa_assoc_table" as t1 set mod_time=1638441610, deleted=1, def_qos_id=DEFAULT, shares=DEFAULT, max_jobs=DEFAULT, max_jobs_accrue=DEFAULT, min_prio_thresh=DEFAULT, max_submit_jobs=DEFAULT, max_wall_pj=DEFAULT, max_tres_pj=DEFAULT,... The Slurm code generating this query has not changed from 20.02 to 20.11 (see it in remove_common() in src/plugins/accounting_storage/mysql/accounting_storage_mysql.c). The minor (.29 to .31) MariaDB upgrade should have had nothing to do with it. From comment 7 you list these changes: > * We started with slurm 17 and debian 9 > * This upgrade started with debian 10.10 --> mariadb 10.3.29 > * upgraded from slurm 20.02 --> 20.11.8 > * upgraded debian 10.11 --> mariadb 10.3.31 Debian 9 shipped with MariaDB 10.1: >https://mariadb.com/resources/blog/mariadb-server-default-in-debian-9/ And, as mentioned in comment 6: >Before MariaDB 10.2.1, BLOB and TEXT columns could not be assigned a DEFAULT value. This restriction was lifted in MariaDB 10.2.1. Your cluster assoc table would have been created without defaults (no "default ''") for the text and blob fields with MariaDB 10.1 when you first installed Slurm and we can see that from your dump from comment 5: >CREATE TABLE `lisa_assoc_table` ( > ... > `user` tinytext NOT NULL, > `acct` tinytext NOT NULL, > `partition` tinytext NOT NULL, > `parent_acct` tinytext NOT NULL, > `max_tres_pj` text NOT NULL, > ... > PRIMARY KEY (`id_assoc`), > UNIQUE KEY `udex` (`user`(42),`acct`(42),`partition`(42)), > KEY `lft` (`lft`), > KEY `account` (`acct`(42)) >) ENGINE=InnoDB AUTO_INCREMENT=46995 DEFAULT CHARSET=utf8mb4; The problem appears to be a MariaDB issue but I can't pinpoint it. Was your slurm accounting db dropped and recreated at any point when doing Slurm/MariaDB/Distro upgrades?
Since the message slurm is reporting is from MariaDB (comment 0), you way want to check with your db admin on the issue. Also, we've downgraded the severity of the ticket (comment 8).
Any update on this from your end? Would like to close if things are working now.
(In reply to Chad Vizino from comment #16) > (In reply to jaap.dijkshoorn@surf.nl from comment #14) > > We want to understand how we end up hitting this 'bug' at this point. Was it > > the way the tables were created in the past by SLURM in mysql (mariadb) and > > that deleting stuff in this table is now handled slightly different then > > before? or something else? Cause the update of Mariadb at this stage was > > minor. (from 10.3.29 to 10.3.31) > About the failing query you list in comment 0: > > >[2021-12-02T11:40:13.961] error: mysql_query failed: 1364 Field 'max_tres_pj' doesn't have a default value > >update "lisa_assoc_table" as t1 set mod_time=1638441610, deleted=1, def_qos_id=DEFAULT, shares=DEFAULT, max_jobs=DEFAULT, max_jobs_accrue=DEFAULT, min_prio_thresh=DEFAULT, max_submit_jobs=DEFAULT, max_wall_pj=DEFAULT, max_tres_pj=DEFAULT,... > The Slurm code generating this query has not changed from 20.02 to 20.11 > (see it in remove_common() in > src/plugins/accounting_storage/mysql/accounting_storage_mysql.c). > > The minor (.29 to .31) MariaDB upgrade should have had nothing to do with > it. From comment 7 you list these changes: > > > * We started with slurm 17 and debian 9 > > * This upgrade started with debian 10.10 --> mariadb 10.3.29 > > * upgraded from slurm 20.02 --> 20.11.8 > > * upgraded debian 10.11 --> mariadb 10.3.31 > Debian 9 shipped with MariaDB 10.1: > > >https://mariadb.com/resources/blog/mariadb-server-default-in-debian-9/ > And, as mentioned in comment 6: > > >Before MariaDB 10.2.1, BLOB and TEXT columns could not be assigned a DEFAULT value. This restriction was lifted in MariaDB 10.2.1. > Your cluster assoc table would have been created without defaults (no > "default ''") for the text and blob fields with MariaDB 10.1 when you first > installed Slurm and we can see that from your dump from comment 5: > > >CREATE TABLE `lisa_assoc_table` ( > > ... > > `user` tinytext NOT NULL, > > `acct` tinytext NOT NULL, > > `partition` tinytext NOT NULL, > > `parent_acct` tinytext NOT NULL, > > `max_tres_pj` text NOT NULL, > > ... > > PRIMARY KEY (`id_assoc`), > > UNIQUE KEY `udex` (`user`(42),`acct`(42),`partition`(42)), > > KEY `lft` (`lft`), > > KEY `account` (`acct`(42)) > >) ENGINE=InnoDB AUTO_INCREMENT=46995 DEFAULT CHARSET=utf8mb4; > The problem appears to be a MariaDB issue but I can't pinpoint it. Was your > slurm accounting db dropped and recreated at any point when doing > Slurm/MariaDB/Distro upgrades? Yes it was. We did this during the upgrade of SLURM from 19 to 20.02
Closing since it seems like things are working for you now.
Please see bug 13562 comment 21 for a more complete analysis of this issue and some additional steps you may want to consider.