Dear support, this night our slurmdbd crashed with the following message: slurmdbd[1572]: fatal: _mysql_query_internal: unable to resolve deadlock with attempts 10/10: 1213 Deadlock found when trying to get lock; try restarting transaction#012 Please call 'show engine innodb status;' in MySQL/MariaDB and open a bug report with SchedMD. We have a configuration with 2 nodes, on active and the other in background, but both fell down. Attached you will find the result of the command suggested on the log. I've restarted both slurmdbd processes and situation is back to normal, but we will need your guidance in order to avoid such problem in the future. Thank you Kind regards, Marco Induni Here the version running # rpm -qa | grep -iE "maria|galera" | sort galera-4-26.4.8-1.el7.centos.x86_64 MariaDB-client-10.4.19-1.el7.centos.x86_64 MariaDB-common-10.4.19-1.el7.centos.x86_64 MariaDB-compat-10.4.19-1.el7.centos.x86_64 MariaDB-devel-10.4.19-1.el7.centos.x86_64 MariaDB-server-10.4.19-1.el7.centos.x86_64 MariaDB-shared-10.4.19-1.el7.centos.x86_64
Created attachment 23612 [details] Output of command: show engine innodb status
Would you also attach the slurmdbd log from around the time of the fatal (assuming there is anything there) as well as your slurmdbd.conf (but please redact the database password)? Thanks! --Tim
Created attachment 23627 [details] Active node slurmdbd log
Created attachment 23629 [details] Active node messages log
Created attachment 23630 [details] Backup standby messages log
Hi Tim, as requested attached you will log and configuration. Kind regards. Marco
Created attachment 23631 [details] Configuartion file slurmdbd
Thank you for the additional logging. Unfortunately what we want to look for in the show engine innodb status output isn't present so its less definitive what is going on here. Would you also mind attaching the output of "SHOW VARIABLES;" from the database? Thanks! --Tim
Created attachment 23635 [details] show variables output Hi Tim, attached the out of: mysql --table -e "show variables" > mysql-show-variables.log Bests regards, Marco
Thanks for this output Marco, I've been looking through the logs etc and its still not conclusive what caused the hangup, but it looks like around that time archive/purge was running and it seems to be fairly slow to run. Those operations can hold locks for a while and *might* be related. There are some improvements coming in 22.05 that can help speed these up. The first thing I would do here is try increasing the deadlock detection timer, its set at the default 50 seconds. Would you be able to change it to 100 seconds? Its in microseconds, so the config line would be something like deadlock_timeout_long=100000000 Thanks! --Tim
Dear Tim, as agreed I've updated the deadlock timeout to deadlock_timeout_long=100000000 Since the event happened just once, I think we can close this ticket for the moment and I will reopen it or create a new one in case the same problem will hit the system another time. Thank you for the support and all the best. Marco Induni
(In reply to Marco Induni from comment #12) > Dear Tim, > > as agreed I've updated the deadlock timeout > to deadlock_timeout_long=100000000 > > Since the event happened just once, I think we can close this ticket for the > moment and I will reopen it or create a new one in case the same problem > will hit the system another time. > > > Thank you for the support and all the best. > > Marco Induni Thanks for the update Marco, please let us know if it happens again! --Tim