MySQL PXC 集群死锁分析案例

2024年7月24日 213点热度 0人点赞 0条评论

前不久一个系统死锁导致部分业务受到影响,今次补上详细的节点日志分析过程。
这个PXC集群有三个节点,分别是 108、109、110,日志信息的ip6地址、节点编号等信息均已做脱敏处理。
以下日志里面,3个节点对应的配置信息是:

108 99999999-9908 9999:9999:9999:9999::6c
109 99999999-9909 9999:9999:9999:9999::6d
110 99999999-9910 9999:9999:9999:9999::6e

一、节点日志分析109节点

1.1 之前数据库节点一直正常,且无较大的事务,直到 11:06 ~ 11:07 成功完成了1次较大的事务,全局缓存页达到了 500MB,如下:

2024-07-15T03:06:29.228915Z 0 [Note] [MY-000000] [Galera] Created page /mysqldata/mysql/gcache.page.000768 of size 515922640 bytes
...
2024-07-15T03:07:48.922772Z 0 [Note] [MY-000000] [Galera] Deleted page /mysqldata/mysql/gcache.page.000768
...

1.2 9分钟以后即 11:16 再次发生大事务,全局缓存页达到了 570MB,这次报错了,报错如下:

2024-07-15T03:16:48.358335Z 0 [Note] [MY-000000] [Galera] Created page /mysqldata/mysql/gcache.page.000769 of size 586372048 bytes

1.3 以下报错信息显示,表 tm_xj.rms_device_info_zw_month_bak 的操作,在本节点同步执行 replication 时失败,原因在于该多语句大事务过大,超过了系统设置 max_binlog_cache_size,经查该设置是 512M:max_binlog_cache_size = 512M

2024-07-15T03:17:47.506531Z 2 [ERROR] [MY-010584] [Repl] Replica SQL: Could not execute Write_rows event on table tm_xj.rms_device_info_zw_month_bak; Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mysqld variable and try again, Error_code: 1197; Writing one row to the row-based binary log failed, Error_code: 1534; handler error HA_ERR_RBR_LOGGING_FAILED; the event's source log FIRST, end_log_pos 0, Error_code: MY-001197
2024-07-15T03:17:47.506611Z 2 [Warning] [MY-000000] [WSREP] Event 66115 Write_rows apply failed: 161, seqno 9338777

1.4 以下信息显示,节点仲裁时 Received bogus VOTE message,进而检测到了冲突 Inconsistency detected,然后导致本节点于 11:18 停止服务。

2024-07-15T03:18:34.561764Z 0 [Note] [MY-000000] [Galera] Member 1(pxc-cluster-node-2) initiates vote on 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777,892e7c82cf62c56b:  Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mysqld variable and try again, Error_code: 1197; Writing one row to the row-based binary log failed, Error_code: 1534;
2024-07-15T03:18:34.562059Z 0 [Note] [MY-000000] [Galera] Recomputed vote based on error codes: 1197, 1534. New vote d659be586fd355cc will be used for further steps. Old Vote: 892e7c82cf62c56b
2024-07-15T03:18:34.562104Z 0 [Note] [MY-000000] [Galera] Votes over 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777:
   0000000000000000:   1/3
   d659be586fd355cc:   1/3
Waiting for more votes.
2024-07-15T03:18:34.563801Z 0 [Note] [MY-000000] [Galera] Member 0(pxc-cluster-node-3) responds to vote on 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777,0000000000000000: Success
2024-07-15T03:18:34.563836Z 0 [Warning] [MY-000000] [Galera] Received bogus VOTE message: 9338777.0, from node 2a36a321-f834-11ee-bc44-bb66609ee19b, expected > 9338860. Ignoring.
2024-07-15T03:18:34.563857Z 0 [Note] [MY-000000] [Galera] Votes over 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777:
   0000000000000000:   1/3
   d659be586fd355cc:   1/3
Waiting for more votes.
2024-07-15T03:18:34.563873Z 0 [Note] [MY-000000] [Galera] Member 2(pxc-cluster-node-1) responds to vote on 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777,0000000000000000: Success
2024-07-15T03:18:34.563888Z 0 [Note] [MY-000000] [Galera] Votes over 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777:
   0000000000000000:   2/3
   d659be586fd355cc:   1/3
Winner: 0000000000000000
2024-07-15T03:18:34.564007Z 2 [ERROR] [MY-000000] [Galera] Inconsistency detected: Inconsistent by consensus on 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777
     at /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/rpmbuild/BUILD/Percona-XtraDB-Cluster-8.0.36/percona-xtradb-cluster-galera/galera/src/replicator_smm.cpp:process_apply_error():1462
2024-07-15T03:18:34.565436Z 2 [Note] [MY-000000] [Galera] Closing send monitor...
2024-07-15T03:18:34.565477Z 2 [Note] [MY-000000] [Galera] Closed send monitor.
2024-07-15T03:18:34.565497Z 2 [Note] [MY-000000] [Galera] gcomm: terminating thread
2024-07-15T03:18:34.565530Z 2 [Note] [MY-000000] [Galera] gcomm: joining thread
2024-07-15T03:18:34.567791Z 2 [Note] [MY-000000] [Galera] gcomm: closing backend
2024-07-15T03:18:34.569470Z 2 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(NON_PRIM,99999999-9910,9)
memb {
    601c1fce-8a87,0
    }
joined {
    }
left {
    }
partitioned {
    99999999-9910,0
    99999999-9908,0
    }
)
2024-07-15T03:18:34.569567Z 2 [Note] [MY-000000] [Galera] PC protocol downgrade 1 -> 0
2024-07-15T03:18:34.569592Z 2 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view ((empty))
2024-07-15T03:18:34.570032Z 2 [Note] [MY-000000] [Galera] gcomm: closed
2024-07-15T03:18:34.570113Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2024-07-15T03:18:34.570222Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [100, 100]
2024-07-15T03:18:34.570258Z 0 [Note] [MY-000000] [Galera] Received NON-PRIMARY.
2024-07-15T03:18:34.570275Z 0 [Note] [MY-000000] [Galera] Shifting SYNCED -> OPEN (TO: 9338911)
2024-07-15T03:18:34.570294Z 0 [Note] [MY-000000] [Galera] New SELF-LEAVE.
2024-07-15T03:18:34.570322Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [0, 0]
2024-07-15T03:18:34.570334Z 0 [Note] [MY-000000] [Galera] Received SELF-LEAVE. Closing connection.
2024-07-15T03:18:34.570343Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> CLOSED (TO: 9338911)
2024-07-15T03:18:34.570356Z 0 [Note] [MY-000000] [Galera] RECV thread exiting 0: Success
2024-07-15T03:18:34.571771Z 2 [Note] [MY-000000] [Galera] recv_thread() joined.
2024-07-15T03:18:34.571808Z 2 [Note] [MY-000000] [Galera] Closing replication queue.
2024-07-15T03:18:34.571823Z 2 [Note] [MY-000000] [Galera] Closing slave action queue.
2024-07-15T03:18:34.575876Z 14 [Note] [MY-000000] [Galera] ================================================
View:
  id: 2a37cad4-f834-11ee-916e-da042bca7bc3:9338911
  status: non-primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 0
  members(1):
    0: 601c1fce-f834-11ee-8a87-33f7192b1c5c, pxc-cluster-node-2
=================================================
2024-07-15T03:18:34.575947Z 14 [Note] [MY-000000] [Galera] Non-primary view
2024-07-15T03:18:34.575967Z 14 [Note] [MY-000000] [WSREP] Server status change synced -> connected
2024-07-15T03:18:34.575983Z 14 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2024-07-15T03:18:34.576009Z 14 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2024-07-15T03:18:34.576066Z 14 [Note] [MY-000000] [Galera] ================================================
View:
  id: 2a37cad4-f834-11ee-916e-da042bca7bc3:9338911
  status: non-primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
  final: yes
  own_index: -1
  members(0):
=================================================
2024-07-15T03:18:34.576088Z 14 [Note] [MY-000000] [Galera] Non-primary view
2024-07-15T03:18:34.576099Z 14 [Note] [MY-000000] [WSREP] Server status change connected -> disconnected
2024-07-15T03:18:34.576108Z 14 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2024-07-15T03:18:34.576121Z 14 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2024-07-15T03:18:34.576141Z 14 [Note] [MY-000000] [Galera] Waiting 600 seconds for 8 receivers to finish
2024-07-15T03:18:34.577081Z 13 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0
2024-07-15T03:18:34.577082Z 11 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0
2024-07-15T03:18:34.577186Z 11 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 11
2024-07-15T03:18:34.577148Z 13 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 13
2024-07-15T03:18:34.577235Z 12 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0
2024-07-15T03:18:34.577210Z 16 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0
2024-07-15T03:18:34.577088Z 15 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0
2024-07-15T03:18:34.577335Z 16 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 16
2024-07-15T03:18:34.577356Z 15 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 15
2024-07-15T03:18:34.577365Z 10 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0
2024-07-15T03:18:34.577389Z 10 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 10
2024-07-15T03:18:34.577284Z 12 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 12
2024-07-15T03:18:34.593783Z 2 [ERROR] [MY-000000] [Galera] Failed to apply write set: gtid: 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777 server_id: 2a36a321-f834-11ee-bc44-bb66609ee19b client_id: 2946137 trx_id: 710800782 flags: 3 (start_transaction | commit)
2024-07-15T03:18:34.593950Z 2 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0
2024-07-15T03:18:34.593996Z 2 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 2
2024-07-15T03:18:35.268179Z 2865366 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is deprecated and will be removed in a future release. Please use caching_sha2_password instead'
2024-07-15T03:18:35.887809Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2024-07-15T03:18:35.888089Z 14 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 6
2024-07-15T03:18:35.888185Z 14 [Note] [MY-000000] [Galera] Slave thread exit. Return code: 0
2024-07-15T03:18:35.888213Z 14 [Note] [MY-000000] [WSREP] Applier thread exiting ret: 0 thd: 14
2024-07-15T03:18:35.901142Z 2865367 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is deprecated and will be removed in a future release. Please use caching_sha2_password instead'
2024-07-15T03:18:36.023349Z 2865368 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is deprecated and will be removed in a future release. Please use caching_sha2_password instead'
2024-07-15T03:18:36.118825Z 0 [Note] [MY-000000] [Galera] Deleted page /mysqldata/mysql/gcache.page.000769

1.5 之后直到 11:59 有人关闭该节点:

2024-07-15T03:59:05.397213Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.36-28.1).
2024-07-15T03:59:05.397292Z 0 [Note] [MY-000000] [WSREP] Received shutdown signal. Will sleep for 10 secs before initiating shutdown. pxc_maint_mode switched to SHUTDOWN
## 1.6 睡眠 10 秒后节点关闭:
2024-07-15T03:59:15.401674Z 0 [Note] [MY-000000] [WSREP] Shutdown replication
2024-07-15T03:59:15.401759Z 0 [Note] [MY-000000] [WSREP] Waiting for active wsrep applier to exit
2024-07-15T03:59:15.401863Z 0 [Note] [MY-000000] [WSREP] All applier thread terminated. Will now terminate rollback thread
2024-07-15T03:59:15.401919Z 0 [Note] [MY-000000] [WSREP] Waiting for rollback thread to terminate
2024-07-15T03:59:15.401971Z 1 [Note] [MY-000000] [WSREP] rollbacker thread exiting 1
2024-07-15T03:59:16.402226Z 0 [Note] [MY-000000] [WSREP] Rollback thread terminated
2024-07-15T03:59:18.406295Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875920  user: 'tm_xj'.
2024-07-15T03:59:18.406692Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875496  user: 'tm_xj'.
2024-07-15T03:59:18.406865Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875376  user: 'root'.
2024-07-15T03:59:18.407021Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875632  user: 'root'.
2024-07-15T03:59:18.407164Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875352  user: 'root'.
2024-07-15T03:59:18.407303Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875648  user: 'root'.
2024-07-15T03:59:18.408120Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875616  user: 'tm_xj'.
2024-07-15T03:59:18.408298Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875848  user: 'root'.
2024-07-15T03:59:18.408436Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2874352  user: 'tm_xj'.
2024-07-15T03:59:18.408550Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875912  user: 'root'.
2024-07-15T03:59:18.408652Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875560  user: 'tm_xj'.
2024-07-15T03:59:18.408749Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2874336  user: 'tm_xj'.
2024-07-15T03:59:18.408850Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875360  user: 'tm_xj'.
2024-07-15T03:59:18.408951Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875744  user: 'root'.
2024-07-15T03:59:18.409067Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2876000  user: 'tm_xj'.
2024-07-15T03:59:18.409169Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875368  user: 'root'.
2024-07-15T03:59:18.409244Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875928  user: 'root'.
2024-07-15T03:59:18.409344Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872568  user: 'tm_xj'.
2024-07-15T03:59:18.409439Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875720  user: 'tm_xj'.
2024-07-15T03:59:18.409571Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875440  user: 'tm_xj'.
2024-07-15T03:59:18.409674Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875464  user: 'root'.
2024-07-15T03:59:18.409778Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875480  user: 'tm_xj'.
2024-07-15T03:59:18.409889Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872560  user: 'tm_xj'.
2024-07-15T03:59:18.409984Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875737  user: 'root'.
2024-07-15T03:59:18.410252Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875521  user: 'tm_xj'.
2024-07-15T03:59:18.411058Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875353  user: 'root'.
2024-07-15T03:59:18.411180Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875625  user: 'root'.
2024-07-15T03:59:18.411344Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875649  user: 'root'.
2024-07-15T03:59:18.411446Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875569  user: 'root'.
2024-07-15T03:59:18.411550Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872569  user: 'tm_xj'.
2024-07-15T03:59:18.411660Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875441  user: 'tm_xj'.
2024-07-15T03:59:18.411860Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875921  user: 'tm_xj'.
2024-07-15T03:59:18.411963Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875721  user: 'tm_xj'.
2024-07-15T03:59:18.412091Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875929  user: 'root'.
2024-07-15T03:59:18.412289Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875417  user: 'tm_xj'.
2024-07-15T03:59:18.412422Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872561  user: 'tm_xj'.
2024-07-15T03:59:18.412569Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875457  user: 'tm_xj'.
2024-07-15T03:59:18.412723Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875465  user: 'root'.
2024-07-15T03:59:18.412841Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875497  user: 'tm_xj'.
2024-07-15T03:59:18.412946Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875906  user: 'root'.
2024-07-15T03:59:18.413037Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875434  user: 'tm_xj'.
2024-07-15T03:59:18.413203Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875418  user: 'tm_xj'.
2024-07-15T03:59:18.413343Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2874354  user: 'tm_xj'.
2024-07-15T03:59:18.413444Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875522  user: 'tm_xj'.
2024-07-15T03:59:18.413583Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875530  user: 'tm_xj'.
2024-07-15T03:59:18.413739Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875850  user: 'root'.
2024-07-15T03:59:18.413871Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875650  user: 'root'.
2024-07-15T03:59:18.414264Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875930  user: 'root'.
2024-07-15T03:59:18.414372Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875634  user: 'root'.
2024-07-15T03:59:18.414464Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875354  user: 'root'.
2024-07-15T03:59:18.414548Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875562  user: 'tm_xj'.
2024-07-15T03:59:18.414702Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875722  user: 'tm_xj'.
2024-07-15T03:59:18.414809Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875370  user: 'root'.
2024-07-15T03:59:18.414906Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875986  user: 'tm_xj'.
2024-07-15T03:59:18.414994Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875730  user: 'tm_xj'.
2024-07-15T03:59:18.415106Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875458  user: 'tm_xj'.
2024-07-15T03:59:18.415233Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875922  user: 'root'.
2024-07-15T03:59:18.415332Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875738  user: 'tm_xj'.
2024-07-15T03:59:18.415430Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872562  user: 'tm_xj'.
2024-07-15T03:59:18.415645Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875466  user: 'root'.
2024-07-15T03:59:18.415737Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875482  user: 'tm_xj'.
2024-07-15T03:59:18.415836Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875490  user: 'tm_xj'.
2024-07-15T03:59:18.415932Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875739  user: 'root'.
2024-07-15T03:59:18.416016Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875435  user: 'tm_xj'.
2024-07-15T03:59:18.416274Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875907  user: 'root'.
2024-07-15T03:59:18.416394Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875731  user: 'tm_xj'.
2024-07-15T03:59:18.416500Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875283  user: 'tm_xj'.
2024-07-15T03:59:18.416620Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872563  user: 'tm_xj'.
2024-07-15T03:59:18.416717Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875555  user: 'tm_xj'.
2024-07-15T03:59:18.416839Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875923  user: 'root'.
2024-07-15T03:59:18.416926Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875563  user: 'root'.
2024-07-15T03:59:18.417035Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875627  user: 'root'.
2024-07-15T03:59:18.417266Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875851  user: 'root'.
2024-07-15T03:59:18.417373Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875355  user: 'root'.
2024-07-15T03:59:18.417483Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875635  user: 'root'.
2024-07-15T03:59:18.417572Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875443  user: 'tm_xj'.
2024-07-15T03:59:18.417706Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875467  user: 'root'.
2024-07-15T03:59:18.417863Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875419  user: 'tm_xj'.
2024-07-15T03:59:18.417966Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875371  user: 'root'.
2024-07-15T03:59:18.418063Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875483  user: 'tm_xj'.
2024-07-15T03:59:18.418155Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875491  user: 'tm_xj'.
2024-07-15T03:59:18.418251Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875651  user: 'root'.
2024-07-15T03:59:18.418393Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875660  user: 'root'.
2024-07-15T03:59:18.418477Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875732  user: 'tm_xj'.
2024-07-15T03:59:18.418641Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875908  user: 'root'.
2024-07-15T03:59:18.418792Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875724  user: 'tm_xj'.
2024-07-15T03:59:18.418901Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875628  user: 'root'.
2024-07-15T03:59:18.418977Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875420  user: 'tm_xj'.
2024-07-15T03:59:18.419081Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875348  user: 'root'.
2024-07-15T03:59:18.419183Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872564  user: 'tm_xj'.
2024-07-15T03:59:18.419273Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875564  user: 'tm_xj'.
2024-07-15T03:59:18.420210Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875372  user: 'root'.
2024-07-15T03:59:18.420323Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875652  user: 'root'.
2024-07-15T03:59:18.420427Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875428  user: 'root'.
2024-07-15T03:59:18.420522Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875740  user: 'root'.
2024-07-15T03:59:18.420608Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875436  user: 'tm_xj'.
2024-07-15T03:59:18.420749Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875556  user: 'tm_xj'.
2024-07-15T03:59:18.420840Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875284  user: 'tm_xj'.
2024-07-15T03:59:18.420947Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875460  user: 'root'.
2024-07-15T03:59:18.421068Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875476  user: 'tm_xj'.
2024-07-15T03:59:18.421161Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875484  user: 'tm_xj'.
2024-07-15T03:59:18.421248Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875492  user: 'tm_xj'.
2024-07-15T03:59:18.421354Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875852  user: 'root'.
2024-07-15T03:59:18.421461Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875725  user: 'tm_xj'.
2024-07-15T03:59:18.421607Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875325  user: 'tm_xj'.
2024-07-15T03:59:18.421737Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875373  user: 'root'.
2024-07-15T03:59:18.421869Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875285  user: 'tm_xj'.
2024-07-15T03:59:18.421971Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875885  user: 'tm_xj'.
2024-07-15T03:59:18.422128Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875629  user: 'root'.
2024-07-15T03:59:18.422275Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875293  user: 'ingp_auth'.
2024-07-15T03:59:18.422411Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875909  user: 'root'.
2024-07-15T03:59:18.422554Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875477  user: 'tm_xj'.
2024-07-15T03:59:18.422668Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875557  user: 'tm_xj'.
2024-07-15T03:59:18.422772Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875421  user: 'tm_xj'.
2024-07-15T03:59:18.422857Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872565  user: 'tm_xj'.
2024-07-15T03:59:18.422942Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875981  user: 'tm_xj'.
2024-07-15T03:59:18.423032Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875741  user: 'root'.
2024-07-15T03:59:18.423167Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875733  user: 'tm_xj'.
2024-07-15T03:59:18.423277Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875349  user: 'root'.
2024-07-15T03:59:18.423371Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875925  user: 'root'.
2024-07-15T03:59:18.423463Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875461  user: 'tm_xj'.
2024-07-15T03:59:18.423567Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875493  user: 'tm_xj'.
2024-07-15T03:59:18.423695Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875653  user: 'root'.
2024-07-15T03:59:18.423801Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875790  user: 'tm_xj'.
2024-07-15T03:59:18.423921Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875854  user: 'root'.
2024-07-15T03:59:18.424036Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875526  user: 'tm_xj'.
2024-07-15T03:59:18.424166Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875734  user: 'root'.
2024-07-15T03:59:18.424257Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875654  user: 'root'.
2024-07-15T03:59:18.424339Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875350  user: 'root'.
2024-07-15T03:59:18.424426Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875438  user: 'tm_xj'.
2024-07-15T03:59:18.424736Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875662  user: 'root'.
2024-07-15T03:59:18.424827Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875918  user: 'tm_xj'.
2024-07-15T03:59:18.425109Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875286  user: 'tm_xj'.
2024-07-15T03:59:18.425408Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875558  user: 'tm_xj'.
2024-07-15T03:59:18.425490Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875566  user: 'tm_xj'.
2024-07-15T03:59:18.425587Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875742  user: 'root'.
2024-07-15T03:59:18.425779Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875726  user: 'tm_xj'.
2024-07-15T03:59:18.425889Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875910  user: 'root'.
2024-07-15T03:59:18.426107Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875462  user: 'root'.
2024-07-15T03:59:18.426193Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875926  user: 'root'.
2024-07-15T03:59:18.426260Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875414  user: 'tm_xj'.
2024-07-15T03:59:18.426359Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875494  user: 'tm_xj'.
2024-07-15T03:59:18.426451Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875735  user: 'tm_xj'.
2024-07-15T03:59:18.426805Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875911  user: 'root'.
2024-07-15T03:59:18.426919Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875719  user: 'tm_xj'.
2024-07-15T03:59:18.427008Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875415  user: 'tm_xj'.
2024-07-15T03:59:18.427700Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875655  user: 'root'.
2024-07-15T03:59:18.427805Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875535  user: 'tm_xj'.
2024-07-15T03:59:18.427934Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875351  user: 'root'.
2024-07-15T03:59:18.428104Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875631  user: 'root'.
2024-07-15T03:59:18.428205Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875439  user: 'tm_xj'.
2024-07-15T03:59:18.428303Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875367  user: 'root'.
2024-07-15T03:59:18.428431Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875455  user: 'tm_xj'.
2024-07-15T03:59:18.428571Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875743  user: 'root'.
2024-07-15T03:59:18.428693Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875855  user: 'root'.
2024-07-15T03:59:18.429126Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875375  user: 'root'.
2024-07-15T03:59:18.429254Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872567  user: 'tm_xj'.
2024-07-15T03:59:18.429359Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875919  user: 'tm_xj'.
2024-07-15T03:59:18.429483Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2874343  user: 'tm_xj'.
2024-07-15T03:59:18.429589Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875927  user: 'root'.
2024-07-15T03:59:18.429679Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2874335  user: 'tm_xj'.
2024-07-15T03:59:18.429781Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2872559  user: 'tm_xj'.
2024-07-15T03:59:18.429873Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875463  user: 'root'.
2024-07-15T03:59:18.429977Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875479  user: 'tm_xj'.
2024-07-15T03:59:18.430072Z 0 [Warning] [MY-010909] [Server] /usr/sbin/mysqld: Forcing close of thread 2875495  user: 'tm_xj'.
2024-07-15T03:59:18.432939Z 0 [Note] [MY-000000] [Galera] dtor state: CLOSED
2024-07-15T03:59:18.433005Z 0 [Note] [MY-000000] [Galera] MemPool(TrxHandleSlave): hit ratio: 0.999859, misses: 1319, in use: 0, in pool: 1319
2024-07-15T03:59:18.437579Z 0 [Note] [MY-000000] [Galera] mon: entered 9338757 oooe fraction 0 oool fraction 0
2024-07-15T03:59:18.441999Z 0 [Note] [MY-000000] [Galera] mon: entered 9338757 oooe fraction 0.0111082 oool fraction 0.000346834
2024-07-15T03:59:18.446209Z 0 [Note] [MY-000000] [Galera] mon: entered 9565290 oooe fraction 0 oool fraction 2.09089e-07
2024-07-15T03:59:18.446249Z 0 [Note] [MY-000000] [Galera] cert index usage at exit 0
2024-07-15T03:59:18.446274Z 0 [Note] [MY-000000] [Galera] cert trx map usage at exit 0
2024-07-15T03:59:18.446285Z 0 [Note] [MY-000000] [Galera] deps set usage at exit 0
2024-07-15T03:59:18.446298Z 0 [Note] [MY-000000] [Galera] avg deps dist 1.13495
2024-07-15T03:59:18.446310Z 0 [Note] [MY-000000] [Galera] avg cert interval 77.2288
2024-07-15T03:59:18.446321Z 0 [Note] [MY-000000] [Galera] cert index size 3110460
2024-07-15T03:59:18.446399Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2024-07-15T03:59:18.494811Z 0 [Note] [MY-000000] [Galera] wsdb trx map usage 0 conn query map usage 0
2024-07-15T03:59:18.494926Z 0 [Note] [MY-000000] [Galera] MemPool(LocalTrxHandle): hit ratio: 0.999251, misses: 152, in use: 0, in pool: 152
2024-07-15T03:59:18.500371Z 0 [Note] [MY-000000] [Galera] Flushing memory map to disk...
2024-07-15T03:59:25.158921Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.36-28.1)  Percona XtraDB Cluster (GPL), Release rel28, Revision bfb687f, WSREP version 26.1.4.3.

1.7 试图启动失败,看日志是集群状态问题导致的:

【77 (File descriptor in bad state). Most likely it is due to inability to communicate with the cluster primary component. Restart required.】

2024-07-15T04:01:50.357538Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR ********************** 
2024-07-15T04:01:50.357629Z 0 [ERROR] [MY-000000] [WSREP-SST] Possible timeout in receving first data from donor in gtid/keyring stage.
2024-07-15T04:01:50.357642Z 0 [ERROR] [MY-000000] [WSREP-SST] After waiting for the 100 seconds
2024-07-15T04:01:50.357753Z 0 [ERROR] [MY-000000] [WSREP-SST] (defined by sst-initial-timeout/joiner-timeout variable),
2024-07-15T04:01:50.357787Z 0 [ERROR] [MY-000000] [WSREP-SST] the SST process has not started.
2024-07-15T04:01:50.357869Z 0 [ERROR] [MY-000000] [WSREP-SST] This error could be caused by broken network connectivity between
2024-07-15T04:01:50.357934Z 0 [ERROR] [MY-000000] [WSREP-SST] the donor and the joiner (this node).
2024-07-15T04:01:50.358004Z 0 [ERROR] [MY-000000] [WSREP-SST] Check the network connection and restart the joiner node.
2024-07-15T04:01:50.358085Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1395
2024-07-15T04:01:50.358155Z 0 [ERROR] [MY-000000] [WSREP-SST] ****************************************************** 
2024-07-15T04:01:50.358379Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
2024-07-15T04:01:50.384925Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '[9999:9999:9999:9999::6d]' --datadir '/mysqldata/mysql/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '2581362' --mysqld-version '8.0.36-28.1'   '' : 32 (Broken pipe)
2024-07-15T04:01:50.385073Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2024-07-15T04:01:50.385104Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe)
2024-07-15T04:01:50.385295Z 3 [Note] [MY-000000] [Galera] Processing SST received
2024-07-15T04:01:50.385333Z 3 [Note] [MY-000000] [Galera] SST request was cancelled
2024-07-15T04:01:50.385355Z 3 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2024-07-15T04:01:50.385367Z 3 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
2024-07-15T04:01:50.385388Z 3 [Note] [MY-000000] [Galera] Closing send monitor...
2024-07-15T04:01:50.385509Z 3 [Note] [MY-000000] [Galera] Closed send monitor.
2024-07-15T04:01:50.385514Z 1 [ERROR] [MY-000000] [Galera] Requesting state transfer failed: -77(File descriptor in bad state)
2024-07-15T04:01:50.385543Z 3 [Note] [MY-000000] [Galera] gcomm: terminating thread
2024-07-15T04:01:50.385560Z 1 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 77 (File descriptor in bad state). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2024-07-15T04:01:50.385575Z 1 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
2024-07-15T04:01:50.385585Z 3 [Note] [MY-000000] [Galera] gcomm: joining thread
2024-07-15T04:01:50.385612Z 1 [Note] [MY-000000] [Galera] /usr/sbin/mysqld: Terminated.
2024-07-15T04:01:50.385639Z 1 [Note] [MY-000000] [WSREP] Initiating SST cancellation
2024-07-15T04:01:50.385656Z 1 [Note] [MY-000000] [WSREP] Terminating SST process

二、节点日志分析108节点

■■ 以下日志显示,3:18分109节点出现问题(问题详见"109节点日志分析"),继而109节点被驱逐出去集群。

2024-07-15T03:18:34.562042Z 0 [Note] [MY-000000] [Galera] Member 1(pxc-cluster-node-2) initiates vote on 2a37cad4-f834-11ee-916e-da042b
ca7bc3:9338777,892e7c82cf62c56b:  Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase thi
s mysqld variable and try again, Error_code: 1197; Writing one row to the row-based binary log failed, Error_code: 1534;
2024-07-15T03:18:34.562391Z 0 [Note] [MY-000000] [Galera] Recomputed vote based on error codes: 1197, 1534. New vote d659be586fd355cc w
ill be used for further steps. Old Vote: 892e7c82cf62c56b
2024-07-15T03:18:34.562448Z 0 [Note] [MY-000000] [Galera] Votes over 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777:
   0000000000000000:   1/3
   d659be586fd355cc:   1/3
Waiting for more votes.
2024-07-15T03:18:34.562568Z 15 [Note] [MY-000000] [Galera] Got vote request for seqno 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777
2024-07-15T03:18:34.564109Z 0 [Note] [MY-000000] [Galera] Member 0(pxc-cluster-node-3) responds to vote on 2a37cad4-f834-11ee-916e-da04
2bca7bc3:9338777,0000000000000000: Success
2024-07-15T03:18:34.564159Z 0 [Warning] [MY-000000] [Galera] Received bogus VOTE message: 9338777.0, from node 2a36a321-f834-11ee-bc44-
bb66609ee19b, expected > 9338860. Ignoring.
2024-07-15T03:18:34.564188Z 0 [Note] [MY-000000] [Galera] Votes over 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777:
   0000000000000000:   1/3
   d659be586fd355cc:   1/3
Waiting for more votes.
2024-07-15T03:18:34.564212Z 0 [Note] [MY-000000] [Galera] Member 2(pxc-cluster-node-1) responds to vote on 2a37cad4-f834-11ee-916e-da04
2bca7bc3:9338777,0000000000000000: Success
2024-07-15T03:18:34.564229Z 0 [Note] [MY-000000] [Galera] Votes over 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777:
   0000000000000000:   2/3
   d659be586fd355cc:   1/3
Winner: 0000000000000000
Winner: 0000000000000000
2024-07-15T03:18:34.564279Z 15 [Note] [MY-000000] [Galera] Vote 0 (success) on 2a37cad4-f834-11ee-916e-da042bca7bc3:9338777 is consiste
nt with group. Continue.
2024-07-15T03:18:34.571056Z 0 [Note] [MY-000000] [Galera] declaring 99999999-9910 at tcp://[9999:9999:9999:9999::6e]:4567 stable
2024-07-15T03:18:34.571138Z 0 [Note] [MY-000000] [Galera] forgetting 601c1fce-8a87 (tcp://[9999:9999:9999:9999::6d]:4567)
2024-07-15T03:18:34.572400Z 0 [Note] [MY-000000] [Galera] Node 99999999-9910 state primary
2024-07-15T03:18:34.578145Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,99999999-9910,10)
memb {
        99999999-9910,0
        99999999-9908,0
        }
joined {
        }
left {
        }
partitioned {
        601c1fce-8a87,0
        }
)
2024-07-15T03:18:34.578202Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk
2024-07-15T03:18:34.579951Z 0 [Note] [MY-000000] [Galera] forgetting 601c1fce-8a87 (tcp://[9999:9999:9999:9999::6d]:4567)
2024-07-15T03:18:34.579977Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2024-07-15T03:18:34.580032Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: Waiting for state UUID.
2024-07-15T03:18:34.581317Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: eb67b0a0-4258-11ef-a082-2e388f64aaa3
2024-07-15T03:18:34.582510Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: eb67b0a0-4258-11ef-a082-2e388f64aaa3 from 0 (p
xc-cluster-node-3)
2024-07-15T03:18:34.582561Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: eb67b0a0-4258-11ef-a082-2e388f64aaa3 from 1 (p
xc-cluster-node-1)
2024-07-15T03:18:34.582581Z 0 [Note] [MY-000000] [Galera] Quorum results:
        version    = 6,
        component  = PRIMARY,
        conf_id    = 9,
        members    = 2/2 (primary/total),
        act_id     = 9338911,
        last_appl. = 9338766,
        protocols  = 2/11/4 (gcs/repl/appl),
        vote policy= 0,
        group UUID = 2a37cad4-f834-11ee-916e-da042bca7bc3
2024-07-15T03:18:34.582642Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [141, 141]
2024-07-15T03:18:34.582782Z 13 [Note] [MY-000000] [Galera] ####### processing CC 9338912, local, ordered
2024-07-15T03:18:34.582833Z 13 [Note] [MY-000000] [Galera] Maybe drain monitors from 9338911 upto current CC event 9338912 upto:9338911
2024-07-15T03:18:34.582853Z 13 [Note] [MY-000000] [Galera] Drain monitors from 9338911 up to 9338911
2024-07-15T03:18:34.582875Z 13 [Note] [MY-000000] [Galera] ####### My UUID: 6663bf31-40da-11ef-b7ff-dfd7e45ce66f
2024-07-15T03:18:34.582890Z 13 [Note] [MY-000000] [Galera] Skipping cert index reset
2024-07-15T03:18:34.582925Z 13 [Note] [MY-000000] [Galera] REPL Protocols: 11 (6)
2024-07-15T03:18:34.582951Z 13 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 9338911 -> 9338912
2024-07-15T03:18:34.583030Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2024-07-15T03:18:34.586904Z 13 [Note] [MY-000000] [Galera] ================================================
View:
  id: 2a37cad4-f834-11ee-916e-da042bca7bc3:9338912
  status: primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREOR
DERED, STREAMING, NBO
  final: no
  own_index: 1
  members(2):
        0: 2a36a321-f834-11ee-bc44-bb66609ee19b, pxc-cluster-node-3
        1: 6663bf31-40da-11ef-b7ff-dfd7e45ce66f, pxc-cluster-node-1
=================================================
2024-07-15T03:18:34.587017Z 13 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2024-07-15T03:18:34.595121Z 13 [Note] [MY-000000] [Galera] Recording CC from group: 9338912
2024-07-15T03:18:34.595181Z 13 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from group: 9338767
2024-07-15T03:18:34.595204Z 13 [Note] [MY-000000] [Galera] Min available from gcache for CC from group: 9337022

■■ 但随后的日志显示,108节点处于正常服务状态。

■■ 随后的日志显示,04:00有节点重启加入集群:

2024-07-15T04:00:07.309528Z 0 [Note] [MY-000000] [Galera] (99999999-9908, 'tcp://[9999:9999:9999:9999::6c]:4567') connection established
 to 99999999-9909 tcp://[9999:9999:9999:9999::6d]:4567
2024-07-15T04:00:07.573846Z 116353 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is dep
recated and will be removed in a future release. Please use caching_sha2_password instead'
2024-07-15T04:00:07.812276Z 0 [Note] [MY-000000] [Galera] declaring 99999999-9910 at tcp://[9999:9999:9999:9999::6e]:4567 stable
2024-07-15T04:00:07.812392Z 0 [Note] [MY-000000] [Galera] declaring 99999999-9909 at tcp://[9999:9999:9999:9999::6d]:4567 stable
2024-07-15T04:00:07.813733Z 0 [Note] [MY-000000] [Galera] Node 99999999-9910 state primary
2024-07-15T04:00:07.815269Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,99999999-9910,11)
memb {
        99999999-9910,0
        99999999-9908,0
        99999999-9909,0
        }
joined {
        }
left {
        }
partitioned {
        }
)
2024-07-15T04:00:07.815311Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk
2024-07-15T04:00:07.817183Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
2024-07-15T04:00:07.817262Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: Waiting for state UUID.
2024-07-15T04:00:07.821664Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: b97e1f85-425e-11ef-af8a-cac3db9396f3
2024-07-15T04:00:07.822896Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: b97e1f85-425e-11ef-af8a-cac3db9396f3 from 0 (px
c-cluster-node-3)
2024-07-15T04:00:07.822956Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: b97e1f85-425e-11ef-af8a-cac3db9396f3 from 1 (px
c-cluster-node-1)
2024-07-15T04:00:08.310999Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: b97e1f85-425e-11ef-af8a-cac3db9396f3 from 2 (px
c-cluster-node-2)
2024-07-15T04:00:08.311090Z 0 [Note] [MY-000000] [Galera] Quorum results:
        version    = 6,
        component  = PRIMARY,
        conf_id    = 10,
        members    = 2/3 (primary/total),
        act_id     = 9339054,
        last_appl. = 9338766,
        protocols  = 2/11/4 (gcs/repl/appl),
        vote policy= 0,
        group UUID = 2a37cad4-f834-11ee-916e-da042bca7bc3
2024-07-15T04:00:08.311280Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [173, 173]
2024-07-15T04:00:08.311566Z 13 [Note] [MY-000000] [Galera] ####### processing CC 9339055, local, ordered
2024-07-15T04:00:08.311730Z 13 [Note] [MY-000000] [Galera] Maybe drain monitors from 9339054 upto current CC event 9339055 upto:9339054
2024-07-15T04:00:08.311758Z 13 [Note] [MY-000000] [Galera] Drain monitors from 9339054 up to 9339054
2024-07-15T04:00:08.311777Z 13 [Note] [MY-000000] [Galera] ####### My UUID: 6663bf31-40da-11ef-b7ff-dfd7e45ce66f
2024-07-15T04:00:08.311791Z 13 [Note] [MY-000000] [Galera] Skipping cert index reset
2024-07-15T04:00:08.311804Z 13 [Note] [MY-000000] [Galera] REPL Protocols: 11 (6)
2024-07-15T04:00:08.311820Z 13 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 9339054 -> 9339055
2024-07-15T04:00:08.311910Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2024-07-15T04:00:08.315209Z 13 [Note] [MY-000000] [Galera] ================================================
View:
  id: 2a37cad4-f834-11ee-916e-da042bca7bc3:9339055
  status: primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORD
ERED, STREAMING, NBO
  final: no
  own_index: 1
  members(3):
        0: 2a36a321-f834-11ee-bc44-bb66609ee19b, pxc-cluster-node-3
        1: 6663bf31-40da-11ef-b7ff-dfd7e45ce66f, pxc-cluster-node-1
        2: b93005c2-425e-11ef-8b79-73a96136cc10, pxc-cluster-node-2
=================================================

■■ 但紧接着是较多的锁表等待,根据以往测试情况,每次冲突也是总有线程获胜,而同时必然有线程受害:

TRANSACTION 114209457, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT , undo log entries 3
MySQL thread id 16, OS thread handle 139734347618048, query id 6745542 Applying batch of row changes (update)
TRANSACTION 114208702, ACTIVE 1338 sec
, undo log entries 6
MySQL thread id 114856, OS thread handle 139691406636800, query id 6676148 2409:807c:5a06:1::104:1705 tm_xj wsrep: replicating and certi
fying write set
commit
2024-07-15T04:00:08.333081Z 16 [Note] [MY-000000] [WSREP] --------- CONFLICT DETECTED --------
2024-07-15T04:00:08.333098Z 16 [Note] [MY-000000] [WSREP] cluster conflict due to high priority abort for threads:

2024-07-15T04:00:08.333111Z 16 [Note] [MY-000000] [WSREP] Winning thread:
   THD: 16, mode: high priority, state: exec, conflict: executing, seqno: 9339058
   SQL: (null)

2024-07-15T04:00:08.333122Z 16 [Note] [MY-000000] [WSREP] Victim thread:
   THD: 114856, mode: local, state: exec, conflict: certifying, seqno: -1
   SQL: commit

TRANSACTION 114209458, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT , undo log entries 3
MySQL thread id 12, OS thread handle 139734349731584, query id 6745554 Applying batch of row changes (update)
TRANSACTION 114208878, ACTIVE 994 sec
, undo log entries 10
MySQL thread id 115190, OS thread handle 139671641110272, query id 6694341 2409:807c:5a06:1::104:1702 tm_xj wsrep: replicating and certi
fying write set
commit
2024-07-15T04:00:08.334316Z 12 [Note] [MY-000000] [WSREP] --------- CONFLICT DETECTED --------
2024-07-15T04:00:08.334343Z 12 [Note] [MY-000000] [WSREP] cluster conflict due to high priority abort for threads:

2024-07-15T04:00:08.334363Z 12 [Note] [MY-000000] [WSREP] Winning thread:
   THD: 12, mode: high priority, state: exec, conflict: executing, seqno: 9339060
   SQL: (null)

2024-07-15T04:00:08.334379Z 12 [Note] [MY-000000] [WSREP] Victim thread:
   THD: 115190, mode: local, state: exec, conflict: certifying, seqno: -1
   SQL: commit

连续发生了188次冲突

[root@node1:0 /u01/mysqld.log]# grep "CONFLICT DETECTED" mysqld.log.108|grep "2024-07-15T04:00"|wc -l
188

最后一次冲突如下:

2024-07-15T04:00:08.729167Z 15 [Note] [MY-000000] [WSREP] --------- CONFLICT DETECTED --------
2024-07-15T04:00:08.729202Z 15 [Note] [MY-000000] [WSREP] cluster conflict due to high priority abort for threads:

2024-07-15T04:00:08.729221Z 15 [Note] [MY-000000] [WSREP] Winning thread:
   THD: 15, mode: high priority, state: exec, conflict: executing, seqno: 9339273
   SQL: (null)

2024-07-15T04:00:08.729236Z 15 [Note] [MY-000000] [WSREP] Victim thread:
   THD: 115577, mode: local, state: exec, conflict: must_replay, seqno: 9339387
   SQL: commit

■■ 如下日志显示,在这些冲突发生110秒后,108节点试图连接109节点:

2024-07-15T04:01:50.417379Z 0 [Note] [MY-000000] [Galera] (99999999-9908, 'tcp://[9999:9999:9999:9999::6c]:4567') turning message relay
requesting on, nonlive peers: tcp://[9999:9999:9999:9999::6d]:4567
2024-07-15T04:01:51.894449Z 0 [Note] [MY-000000] [Galera] (99999999-9908, 'tcp://[9999:9999:9999:9999::6c]:4567') reconnecting to b93005
c2-8b79 (tcp://[9999:9999:9999:9999::6d]:4567), attempt 0
2024-07-15T04:01:52.574837Z 116695 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is dep
recated and will be removed in a future release. Please use caching_sha2_password instead'
2024-07-15T04:01:53.277070Z 116696 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is dep
recated and will be removed in a future release. Please use caching_sha2_password instead'
2024-07-15T04:01:54.435874Z 116697 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is dep
recated and will be removed in a future release. Please use caching_sha2_password instead'

+++++++++++ 以下日志显示,3秒钟没能连接成功,怀疑超时,断定109节点非活动,该节点被驱逐

2024-07-15T04:01:54.847941Z 0 [Note] [MY-000000] [Galera] declaring node with index 2 suspected, timeout PT5S (evs.suspect_timeout)
2024-07-15T04:01:54.848077Z 0 [Note] [MY-000000] [Galera] evs::proto(99999999-9908, OPERATIONAL, view_id(REG,99999999-9910,11)) suspecti
ng node: 99999999-9909
2024-07-15T04:01:54.848102Z 0 [Note] [MY-000000] [Galera] evs::proto(99999999-9908, OPERATIONAL, view_id(REG,99999999-9910,11)) suspecte
d node without join message, declaring inactive
2024-07-15T04:01:55.348302Z 0 [Note] [MY-000000] [Galera] declaring node with index 2 inactive (evs.inactive_timeout)

2024-07-15T04:01:55.851247Z 0 [Note] [MY-000000] [Galera] declaring 99999999-9910 at tcp://[9999:9999:9999:9999::6e]:4567 stable
2024-07-15T04:01:55.852703Z 0 [Note] [MY-000000] [Galera] Node 99999999-9910 state primary
2024-07-15T04:01:55.858696Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,99999999-9910,12)
memb {
        99999999-9910,0
        99999999-9908,0
        }
joined {
        }
left {
        }
partitioned {
        99999999-9909,0
        }
)
2024-07-15T04:01:55.858739Z 0 [Note] [MY-000000] [Galera] Save the discovered primary-component to disk
2024-07-15T04:01:55.860637Z 0 [Note] [MY-000000] [Galera] forgetting 99999999-9909 (tcp://[9999:9999:9999:9999::6d]:4567)
2024-07-15T04:01:55.860717Z 0 [Note] [MY-000000] [Galera] New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2024-07-15T04:01:55.860857Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: Waiting for state UUID.
2024-07-15T04:01:55.860748Z 0 [Note] [MY-000000] [Galera] (99999999-9908, 'tcp://[9999:9999:9999:9999::6c]:4567') turning message relay 
requesting off

■■ 之后交换状态信息,获取到仲裁结果,确认了2个活动节点:108、110,并清理了109节点

2024-07-15T04:01:55.862455Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: sent state msg: f9e37a10-425e-11ef-86d1-67533f4bae37
2024-07-15T04:01:55.863749Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: f9e37a10-425e-11ef-86d1-67533f4bae37 from 0 (px
c-cluster-node-3)
2024-07-15T04:01:55.863809Z 0 [Note] [MY-000000] [Galera] STATE EXCHANGE: got state msg: f9e37a10-425e-11ef-86d1-67533f4bae37 from 1 (px
c-cluster-node-1)
2024-07-15T04:01:55.863837Z 0 [Note] [MY-000000] [Galera] Quorum results:
        version    = 6,
        component  = PRIMARY,
        conf_id    = 11,
        members    = 2/2 (primary/total),
        act_id     = 9339389,
        last_appl. = 9338766,
        protocols  = 2/11/4 (gcs/repl/appl),
        vote policy= 0,
        group UUID = 2a37cad4-f834-11ee-916e-da042bca7bc3
2024-07-15T04:01:55.864077Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [141, 141]
2024-07-15T04:01:55.864321Z 14 [Note] [MY-000000] [Galera] ####### processing CC 9339390, local, ordered
2024-07-15T04:01:55.864384Z 14 [Note] [MY-000000] [Galera] Maybe drain monitors from 9339389 upto current CC event 9339390 upto:9339389
2024-07-15T04:01:55.864411Z 14 [Note] [MY-000000] [Galera] Drain monitors from 9339389 up to 9339389
2024-07-15T04:01:55.864436Z 14 [Note] [MY-000000] [Galera] ####### My UUID: 6663bf31-40da-11ef-b7ff-dfd7e45ce66f
2024-07-15T04:01:55.864456Z 14 [Note] [MY-000000] [Galera] Skipping cert index reset
2024-07-15T04:01:55.864475Z 14 [Note] [MY-000000] [Galera] REPL Protocols: 11 (6)
2024-07-15T04:01:55.864521Z 14 [Note] [MY-000000] [Galera] ####### Adjusting cert position: 9339389 -> 9339390
2024-07-15T04:01:55.864632Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2024-07-15T04:01:55.867920Z 14 [Note] [MY-000000] [Galera] ================================================
View:
  id: 2a37cad4-f834-11ee-916e-da042bca7bc3:9339390
  status: primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORD
ERED, STREAMING, NBO
  final: no
  own_index: 1
  members(2):
        0: 2a36a321-f834-11ee-bc44-bb66609ee19b, pxc-cluster-node-3
        1: 6663bf31-40da-11ef-b7ff-dfd7e45ce66f, pxc-cluster-node-1
=================================================
2024-07-15T04:01:55.868021Z 14 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2024-07-15T04:01:55.874762Z 14 [Note] [MY-000000] [Galera] Recording CC from group: 9339390
2024-07-15T04:01:55.874837Z 14 [Note] [MY-000000] [Galera] Lowest cert index boundary for CC from group: 9338767
2024-07-15T04:01:55.874866Z 14 [Note] [MY-000000] [Galera] Min available from gcache for CC from group: 9337022
2024-07-15T04:01:58.396276Z 0 [Note] [MY-000000] [Galera]  cleaning up 99999999-9909 (tcp://[9999:9999:9999:9999::6d]:4567)

■■ 之后的日志显示,109节点再次加入未成,继续被驱逐出集群。

三、节点日志分析110节点

■■ 前面的日志同108节点

■■ 4:03的日志显示,109节点再次加入未成,继续被驱逐出集群。

四、总结

根据现场反馈的信息,109节点于11:18分被驱逐后,系统缓慢,业务阻断,此时数据库任务积压,SQL运行慢,大量的锁表,数据库连接数1100左右,大量的wrsp同步线程,杀掉变成killed状态后怀疑资源并未释放。此时即使停掉2个节点,仅保留1个节点,仍然无法提交事务,进而继续大量锁表,此时只能重启数据库解决问题。

总结以上分析过程,以下按照时间顺序客观描述一些症状、现象,以利于找到问题的根本原因。

1、11:16

节点109 第二次发生大事务,全局缓存页达到了 570MB,报错,导致节点109 11:18 停止服务,被驱逐出集群。

2、11:23

一线人员反馈登录时报服务异常,然后自动跳转回登陆页面,导致登录不上系统。项目经理收到问题反馈后开始排查系统问题。登录ELK系统,查看系统监控发现家客异步导出微服务延迟异常。进一步查看详情,发现数据库出现积压,SQL执行变慢。

3、11:47

其他系统侧人员电话联系反馈该系统接口调用失败,导致相关工单无法办理。

4、12时

系统恢复正常登陆后,其他侧反馈接口正常,但是仍有小部分接口出现异常情况,需多次提交,才能成功。

5、12:03

运维人员后台监控发现出入库异常,"Error updating database. Caus"
运维人员根据异常信息定位到有锁表进程并对锁表进程进行释放

6、12:35左右

出入库接口依然存在无法正常回单问题,经核查发现数据库进程状态虽然变成“kill”,但是并没有释放。然后立即将现场最新情况反馈给公司专家,寻求远程协助。

7、13:30

停掉了两个备库节点,只保留主数据库节点。然后滚动重启相关服务及接口。重启完后,验证系统登陆正常,观察相关接口也正常。

8、14:00

支撑群里面一线装维又反馈回单出现报错如下:
Could not commit JDBC transaction; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTra Deadlock found when trying to getlock; try restarting transaction
经与公司专家紧急开会讨论,初步估计是PXC集群异常,虽然已经下线了两个备库节点,但是主节点还在做强一致性同步,导致事务无法提交,从而出现大量锁表的情况。

9、16:15

电话联系系统管理员XXX,申请重启数据库。于16:25完成数据库重启,然后滚动重启相关服务及接口。重启完后,验证系统登陆正常,系统接口也正常。

10、16:44

在支撑群中通知一线人员系统恢复正常,重新进行回单。

一线分析参考:

本次故障是一线异步导出全网数据时,后台会同时更新XX平台割接过来的XXX关系状态,导出数据量过大时,出现了大事务提交,导致备库节点异常,从而整个PXC集群各节点数据同步卡死,最终导致事务提交不了,出现锁表的情况。

liking

这个人很懒,什么都没留下

文章评论