高可用mongodb集群(分片+副本):shard2副本重建

2022年9月15日 236点热度 1人点赞 0条评论

■■ shard2副本无法启动

■ 项目人员反馈集群的shard2副本无法启动

看了下,shard2副本能正常起来,但是过会儿以后就down掉,shard2的主节点日志显示shard2副本启动时马上进入 ROLLBACK 回滚状态:
2022-09-14T15:58:08.833+0800 I REPL [replexec-1] Member node3:27002 is now in state ROLLBACK
3分钟之后,shard2的主节点日志显示shard2副本 DOWN:
2022-09-14T16:01:09.246+0800 I REPL [replexec-2] Member node3:27002 is now in state RS_DOWN

■ 认真阅读了副本的日志,【我确认】它没啥有用的报错信息,就莫名其妙的 DOWN 了!是的,莫名其妙!怎么也得给一个相对明确的理由吧???我对它的日志设计合理性很表示怀疑!这个古老的版本是3.6.23,请为它树立墓志铭!

为此,我把它从启动开始到灭亡的所有日志都贴出来,作为证据:

2022-09-14T15:54:08.443+0800 I CONTROL  [main] ***** SERVER RESTARTED *****
2022-09-14T15:54:08.494+0800 I CONTROL  [initandlisten] MongoDB starting : pid=417227 port=27002 dbpath=/data/mongodb/shard2/data 64-bit host=mongodb3
2022-09-14T15:54:08.494+0800 I CONTROL  [initandlisten] db version v3.6.23
2022-09-14T15:54:08.494+0800 I CONTROL  [initandlisten] git version: d352e6a4764659e0d0350ce77279de3c1f243e5c
2022-09-14T15:54:08.494+0800 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.1.1k  FIPS 25 Mar 2021
2022-09-14T15:54:08.494+0800 I CONTROL  [initandlisten] allocator: tcmalloc
2022-09-14T15:54:08.494+0800 I CONTROL  [initandlisten] modules: none
2022-09-14T15:54:08.494+0800 I CONTROL  [initandlisten] build environment:
2022-09-14T15:54:08.494+0800 I CONTROL  [initandlisten]     distmod: rhel80
2022-09-14T15:54:08.494+0800 I CONTROL  [initandlisten]     distarch: x86_64
2022-09-14T15:54:08.494+0800 I CONTROL  [initandlisten]     target_arch: x86_64
2022-09-14T15:54:08.494+0800 I CONTROL  [initandlisten] options: { config: "/data/mongodb/conf/shard2.conf", net: { bindIp: "0.0.0.0,::", ipv6: true, maxIncomingConnections: 20000, port: 27002 }, processManagement: { fork: true, pidFilePath: "/data/mongodb/shard2/log/shard2.pid" }, replication: { replSet: "shard2" }, security: { keyFile: "/data/mongodb/conf/mongo.keyfile" }, sharding: { clusterRole: "shardsvr" }, storage: { dbPath: "/data/mongodb/shard2/data" }, systemLog: { destination: "file", logAppend: true, path: "/data/mongodb/shard2/log/shard2.log" } }
2022-09-14T15:54:08.494+0800 W -        [initandlisten] Detected unclean shutdown - /data/mongodb/shard2/data/mongod.lock is not empty.
2022-09-14T15:54:08.517+0800 I -        [initandlisten] Detected data files in /data/mongodb/shard2/data created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2022-09-14T15:54:08.517+0800 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2022-09-14T15:54:08.517+0800 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=31547M,cache_overflow=(file_max=0M),session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),compatibility=(release="3.0",require_max="3.0"),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2022-09-14T15:54:09.476+0800 I STORAGE  [initandlisten] WiredTiger message [1663142049:476530][417227:0x7f2826c01a40], txn-recover: Main recovery loop: starting at 6440/5376
2022-09-14T15:54:09.477+0800 I STORAGE  [initandlisten] WiredTiger message [1663142049:477008][417227:0x7f2826c01a40], txn-recover: Recovering log 6440 through 6441
2022-09-14T15:54:09.541+0800 I STORAGE  [initandlisten] WiredTiger message [1663142049:541921][417227:0x7f2826c01a40], txn-recover: Recovering log 6441 through 6441
2022-09-14T15:54:09.597+0800 I STORAGE  [initandlisten] WiredTiger message [1663142049:597459][417227:0x7f2826c01a40], txn-recover: Set global recovery timestamp: 0
2022-09-14T15:54:09.645+0800 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2022-09-14T15:54:09.645+0800 I STORAGE  [initandlisten] The size storer reports that the oplog contains 9616 records totaling to 53636431979 bytes
2022-09-14T15:54:09.645+0800 I STORAGE  [initandlisten] Scanning the oplog to determine where to place markers for truncation
2022-09-14T15:58:06.081+0800 I STORAGE  [initandlisten] WiredTiger record store oplog processing took 236435ms
2022-09-14T15:58:06.090+0800 I CONTROL  [initandlisten]
2022-09-14T15:58:06.090+0800 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2022-09-14T15:58:06.090+0800 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2022-09-14T15:58:06.090+0800 I CONTROL  [initandlisten]
2022-09-14T15:58:06.163+0800 I STORAGE  [initandlisten] Found drop-pending namespace flowtest.system.drop.1662704001i5t8.usertable with drop optime { ts: Timestamp(1662704001, 5), t: 8 }
2022-09-14T15:58:06.163+0800 I SHARDING [initandlisten] initializing sharding state with: { _id: "shardIdentity", configsvrConnectionString: "configs/node1:21000,node2:21000,node3:21000", shardName: "shard2", clusterId: ObjectId('62e894ed3d73f307c53b4461') }
2022-09-14T15:58:06.163+0800 I SHARDING [initandlisten] first cluster operation detected, adding sharding hook to enable versioning and authentication to remote servers
2022-09-14T15:58:06.164+0800 I NETWORK  [initandlisten] Starting new replica set monitor for configs/node1:21000,node2:21000,node3:21000
2022-09-14T15:58:06.164+0800 I SHARDING [thread1] creating distributed lock ping thread for process mongodb3:27002:1663142286:-365603539022679667 (sleeping for 30000ms)
2022-09-14T15:58:06.164+0800 I SHARDING [initandlisten] initialized sharding components for secondary node.
2022-09-14T15:58:06.179+0800 I NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to node3:21000 (1 connections now open to node3:21000 with a 5 second timeout)
2022-09-14T15:58:06.180+0800 I NETWORK  [initandlisten] Successfully connected to node2:21000 (1 connections now open to node2:21000 with a 5 second timeout)
2022-09-14T15:58:06.180+0800 I NETWORK  [shard registry reload] Successfully connected to node1:21000 (1 connections now open to node1:21000 with a 5 second timeout)
2022-09-14T15:58:06.180+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Connecting to node2:21000
2022-09-14T15:58:06.180+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Connecting to node3:21000
2022-09-14T15:58:06.180+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Connecting to node2:21000
2022-09-14T15:58:06.181+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Connecting to node3:21000
2022-09-14T15:58:06.182+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to node3:21000, took 2ms (2 connections now open to node3:21000)
2022-09-14T15:58:06.182+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to node3:21000, took 2ms (2 connections now open to node3:21000)
2022-09-14T15:58:06.182+0800 I SHARDING [NetworkInterfaceASIO-ShardRegistry-0] Received reply from config server node (unknown) indicating config server optime term has increased, previous optime { ts: Timestamp(0, 0), t: -1 }, now { ts: Timestamp(1663142285, 1), t: 15 }
2022-09-14T15:58:06.182+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to node2:21000, took 2ms (2 connections now open to node2:21000)
2022-09-14T15:58:06.182+0800 I NETWORK  [shard registry reload] Starting new replica set monitor for shard1/node1:27001,node2:27001
2022-09-14T15:58:06.182+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to node2:21000, took 2ms (2 connections now open to node2:21000)
2022-09-14T15:58:06.182+0800 I NETWORK  [shard registry reload] Starting new replica set monitor for shard2/node2:27002,node3:27002
2022-09-14T15:58:06.182+0800 I NETWORK  [shard registry reload] Starting new replica set monitor for shard3/node1:27003,node3:27003
2022-09-14T15:58:06.188+0800 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: LockStateChangeFailed: findAndModify query predicate didn't match any lock document
2022-09-14T15:58:06.198+0800 I NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to node1:27001 (1 connections now open to node1:27001 with a 5 second timeout)
2022-09-14T15:58:06.213+0800 I NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to node2:27001 (1 connections now open to node2:27001 with a 5 second timeout)
2022-09-14T15:58:06.222+0800 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/data/mongodb/shard2/data/diagnostic.data'
2022-09-14T15:58:06.229+0800 I NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to node2:27002 (1 connections now open to node2:27002 with a 5 second timeout)
2022-09-14T15:58:06.230+0800 W NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Failed to connect to 10.105.2.9:27002, in(checking socket for error after poll), reason: Connection refused
2022-09-14T15:58:06.245+0800 I NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to node1:27003 (1 connections now open to node1:27003 with a 5 second timeout)
2022-09-14T15:58:06.249+0800 I REPL     [initandlisten] Rollback ID is 1
2022-09-14T15:58:06.258+0800 I REPL     [initandlisten] Starting recovery oplog application at the appliedThrough: { ts: Timestamp(1662705355, 24), t: 8 }
2022-09-14T15:58:06.258+0800 I REPL     [initandlisten] No oplog entries to apply for recovery. appliedThrough is at the top of the oplog.
2022-09-14T15:58:06.259+0800 I NETWORK  [initandlisten] listening via socket bound to 0.0.0.0
2022-09-14T15:58:06.259+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Connecting to node1:21000
2022-09-14T15:58:06.259+0800 I NETWORK  [initandlisten] listening via socket bound to ::
2022-09-14T15:58:06.259+0800 I NETWORK  [initandlisten] listening via socket bound to /tmp/mongodb-27002.sock
2022-09-14T15:58:06.259+0800 I NETWORK  [initandlisten] waiting for connections on port 27002
2022-09-14T15:58:06.260+0800 I NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to node3:27003 (1 connections now open to node3:27003 with a 5 second timeout)
2022-09-14T15:58:06.261+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to node1:21000, took 2ms (1 connections now open to node1:21000)
2022-09-14T15:58:06.261+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Connecting to node2:27002
2022-09-14T15:58:06.263+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to node2:27002, took 2ms (1 connections now open to node2:27002)
2022-09-14T15:58:06.276+0800 I REPL     [replexec-0]
2022-09-14T15:58:06.276+0800 I REPL     [replexec-0] ** WARNING: This replica set has a Primary-Secondary-Arbiter architecture, but readConcern:majority is enabled
2022-09-14T15:58:06.276+0800 I REPL     [replexec-0] **          for this node. This is not a recommended configuration. Please see
2022-09-14T15:58:06.276+0800 I REPL     [replexec-0] **          https://dochub.mongodb.org/core/psa-disable-rc-majority-3.6
2022-09-14T15:58:06.276+0800 I REPL     [replexec-0]
2022-09-14T15:58:06.276+0800 I REPL     [replexec-0] New replica set config in use: { _id: "shard2", version: 1, protocolVersion: 1, members: [ { _id: 0, host: "node1:27002", arbiterOnly: true, buildIndexes: true, hidden: false, priority: 0.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "node2:27002", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "node3:27002", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('62e895986d02eb140daedcb8') } }
2022-09-14T15:58:06.276+0800 I REPL     [replexec-0] This node is node3:27002 in the config
2022-09-14T15:58:06.276+0800 I REPL     [replexec-0] transition to STARTUP2 from STARTUP
2022-09-14T15:58:06.276+0800 I REPL     [replexec-0] Starting replication storage threads
2022-09-14T15:58:06.276+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to node1:27002
2022-09-14T15:58:06.277+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to node2:27002
2022-09-14T15:58:06.277+0800 I REPL     [replexec-0] transition to RECOVERING from STARTUP2
2022-09-14T15:58:06.277+0800 I REPL     [replexec-0] Starting replication fetcher thread
2022-09-14T15:58:06.277+0800 I REPL     [replexec-0] Starting replication applier thread
2022-09-14T15:58:06.277+0800 I REPL     [replexec-0] Starting replication reporter thread
2022-09-14T15:58:06.278+0800 I REPL     [rsSync] transition to SECONDARY from RECOVERING
2022-09-14T15:58:06.278+0800 I REPL     [rsSync] Resetting sync source to empty, which was :27017
2022-09-14T15:58:06.287+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to node1:27002, took 11ms (1 connections now open to node1:27002)
2022-09-14T15:58:06.287+0800 I REPL     [replexec-1] Member node1:27002 is now in state ARBITER
2022-09-14T15:58:06.288+0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to node2:27002, took 11ms (1 connections now open to node2:27002)
2022-09-14T15:58:06.288+0800 I REPL     [replexec-0] Member node2:27002 is now in state PRIMARY
2022-09-14T15:58:06.819+0800 I NETWORK  [listener] connection accepted from 10.105.2.8:44144 #1 (1 connection now open)
2022-09-14T15:58:06.819+0800 I NETWORK  [conn1] received client metadata from 10.105.2.8:44144 conn1: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.6.23" }, os: { type: "Linux", name: "Red Hat Enterprise Linux release 8.6 (Ootpa)", architecture: "x86_64", version: "Kernel 4.18.0-372.9.1.el8.x86_64" } }
2022-09-14T15:58:06.833+0800 I ACCESS   [conn1] Successfully authenticated as principal __system on local from client 10.105.2.8:44144
2022-09-14T15:58:07.278+0800 I REPL     [rsBackgroundSync] sync source candidate: node2:27002 【此处确认使用node2:27002作为同步源】
2022-09-14T15:58:07.285+0800 I ASIO     [NetworkInterfaceASIO-RS-0] Connecting to node2:27002
2022-09-14T15:58:07.286+0800 I ASIO     [NetworkInterfaceASIO-RS-0] Successfully connected to node2:27002, took 1ms (1 connections now open to node2:27002)
2022-09-14T15:58:07.287+0800 I REPL     [rsBackgroundSync] Changed sync source from empty to node2:27002
2022-09-14T15:58:07.287+0800 I ASIO     [NetworkInterfaceASIO-RS-0] Connecting to node2:27002
2022-09-14T15:58:07.289+0800 I ASIO     [NetworkInterfaceASIO-RS-0] Successfully connected to node2:27002, took 2ms (2 connections now open to node2:27002)
2022-09-14T15:58:07.425+0800 I REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: Our last op time fetched: { ts: Timestamp(1662705355, 24), t: 8 }. source's GTE: { ts: Timestamp(1662706338, 1), t: 9 } hashes: (-4559377292940263073/5327713539284899481)
2022-09-14T15:58:07.425+0800 I REPL     [rsBackgroundSync] Replication commit point: { ts: Timestamp(0, 0), t: -1 }
2022-09-14T15:58:07.425+0800 I REPL     [rsBackgroundSync] Rollback using the 'rollbackViaRefetch' method because UUID support is feature compatible with featureCompatibilityVersion 3.6.
2022-09-14T15:58:07.425+0800 I REPL     [rsBackgroundSync] transition to ROLLBACK from SECONDARY
2022-09-14T15:58:07.425+0800 I NETWORK  [rsBackgroundSync] Skip closing connection for connection # 1
2022-09-14T15:58:07.425+0800 I ROLLBACK [rsBackgroundSync] Starting rollback. Sync source: node2:27002 【此处确认使用node2:27002作为同步源】
2022-09-14T15:58:07.427+0800 I ROLLBACK [rsBackgroundSync] Finding the Common Point
2022-09-14T15:58:07.431+0800 I ROLLBACK [rsBackgroundSync] our last optime:   Timestamp(1662705355, 24)
2022-09-14T15:58:07.431+0800 I ROLLBACK [rsBackgroundSync] their last optime: Timestamp(1663142280, 1)
2022-09-14T15:58:07.431+0800 I ROLLBACK [rsBackgroundSync] diff in end of log times: -436925 seconds 【这是以上两个数值的差值】
2022-09-14T15:58:07.451+0800 I SH_REFR  [ShardServerCatalogCacheLoader-0] Refresh for collection config.system.sessions took 1190 ms and failed :: caused by :: PrimarySteppedDown: Error waiting for optime { ts: Timestamp(1663142280, 1), t: -1 }, current relevant optime is { ts: Timestamp(1662705355, 24), t: 8 }. :: caused by :: operation was interrupted
2022-09-14T15:58:07.451+0800 I CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Error waiting for optime { ts: Timestamp(1663142280, 1), t: -1 }, current relevant optime is { ts: Timestamp(1662705355, 24), t: 8 }. :: caused by :: operation was interrupted
2022-09-14T15:58:07.451+0800 I CONTROL  [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: Error waiting for optime { ts: Timestamp(1663142280, 1), t: -1 }, current relevant optime is { ts: Timestamp(1662705355, 24), t: 8 }. :: caused by :: operation was interrupted
2022-09-14T15:58:08.194+0800 I NETWORK  [listener] connection accepted from 10.105.2.7:37336 #2 (2 connections now open)
2022-09-14T15:58:08.194+0800 I NETWORK  [conn2] received client metadata from 10.105.2.7:37336 conn2: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.6.23" }, os: { type: "Linux", name: "Red Hat Enterprise Linux release 8.6 (Ootpa)", architecture: "x86_64", version: "Kernel 4.18.0-372.9.1.el8.x86_64" } }
2022-09-14T15:58:08.208+0800 I ACCESS   [conn2] Successfully authenticated as principal __system on local from client 10.105.2.7:37336
2022-09-14T15:58:09.801+0800 I NETWORK  [listener] connection accepted from 10.105.2.8:44146 #3 (3 connections now open)
,,,【此处中间有 conn3 - conn14 的信息,为节省篇幅略掉】
2022-09-14T15:58:36.231+0800 I NETWORK  [conn15] received client metadata from 10.105.2.9:43116 conn15: { driver: { name: "MongoDB Internal Client", version: "3.6.23" }, os: { type: "Linux", name: "Red Hat Enterprise Linux release 8.6 (Ootpa)", architecture: "x86_64", version: "Kernel 4.18.0-372.9.1.el8.x86_64" } }
2022-09-14T15:58:36.245+0800 I ACCESS   [conn15] Successfully authenticated as principal __system on local from client 10.105.2.9:43116
2022-09-14T15:58:36.245+0800 I NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to node3:27002 (1 connections now open to node3:27002 with a 5 second timeout)
2022-09-14T15:59:06.182+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Ending idle connection to host node3:21000 because the pool meets constraints; 1 connections to that host remain open
2022-09-14T15:59:06.183+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Ending idle connection to host node2:21000 because the pool meets constraints; 1 connections to that host remain open
2022-09-14T15:59:07.289+0800 I ASIO     [NetworkInterfaceASIO-RS-0] Ending idle connection to host node2:27002 because the pool meets constraints; 1 connections to that host remain open
2022-09-14T16:44:50.844+0800 I CONTROL  [main] ***** SERVER RESTARTED *****

【上面最后一行日志可见,一直到下一次启动,再无其他日志信息!】
感觉唯一值得怀疑的地方是:
Error waiting for optime { ts: Timestamp(1663142280, 1), t: -1
但是无论如何,最后死掉之前,日志没有关键错误信息,让人摸不着头脑!
pool meets constraints 也不是要害信息。

■■ shard2副本重建

■ 启动未果,原因查不明,于是想干脆删掉这个副本重建吧,此时遇见了安全权限问题,因为shard也开启了KeyFile安全认证,所以直接登录shard无法鉴权!

还记得,用户配置信息是在config服务器的,用户信息并不保存在数据分片上。
修改shard配置文件,去掉KeyFile安全认证后,却无法正常启动shard!只好恢复原来配置。

■ 最后想通了,既然是副本,干脆清理data目录,让它重新同步吧:

已知node2是shard2的primary,在node3的shard2副本无法启动的情况下,清理掉node3的shard2的data目录,启动会自动同步。

2022-09-14T16:44:50.844+0800 I CONTROL  [main] ***** SERVER RESTARTED *****
2022-09-14T16:44:50.918+0800 I CONTROL  [initandlisten] MongoDB starting : pid=588866 port=27002 dbpath=/data/mongodb/shard2/data 64-bit host=mongodb3
2022-09-14T16:44:50.918+0800 I CONTROL  [initandlisten] db version v3.6.23
2022-09-14T16:44:50.918+0800 I CONTROL  [initandlisten] git version: d352e6a4764659e0d0350ce77279de3c1f243e5c
2022-09-14T16:44:50.918+0800 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.1.1k  FIPS 25 Mar 2021
2022-09-14T16:44:50.918+0800 I CONTROL  [initandlisten] allocator: tcmalloc
2022-09-14T16:44:50.918+0800 I CONTROL  [initandlisten] modules: none
2022-09-14T16:44:50.918+0800 I CONTROL  [initandlisten] build environment:
2022-09-14T16:44:50.918+0800 I CONTROL  [initandlisten]     distmod: rhel80
2022-09-14T16:44:50.918+0800 I CONTROL  [initandlisten]     distarch: x86_64
2022-09-14T16:44:50.918+0800 I CONTROL  [initandlisten]     target_arch: x86_64
2022-09-14T16:44:50.918+0800 I CONTROL  [initandlisten] options: { config: "/data/mongodb/conf/shard2.conf", net: { bindIp: "0.0.0.0,::", ipv6: true, maxIncomingConnections: 20000, port: 27002 }, processManagement: { fork: true, pidFilePath: "/data/mongodb/shard2/log/shard2.pid" }, replication: { replSet: "shard2" }, security: { keyFile: "/data/mongodb/conf/mongo.keyfile" }, sharding: { clusterRole: "shardsvr" }, storage: { dbPath: "/data/mongodb/shard2/data" }, systemLog: { destination: "file", logAppend: true, path: "/data/mongodb/shard2/log/shard2.log" } }
,,,【略】
balabala一大堆日志信息显示,数据开始同步
,,,【略】
以下是clone processDataObj:
2022-09-14T17:26:28.635+0800 I -        [repl writer worker 0]   flowtest.processDataObj collection clone progress: 19111465/20174625 94% (documents copied)
2022-09-14T17:27:22.842+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Connecting to node1:21000
2022-09-14T17:27:22.844+0800 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to node1:21000, took 2ms (1 connections now open to node1:21000)
2022-09-14T17:27:37.478+0800 I -        [repl writer worker 0]   flowtest.processDataObj collection clone progress: 19455863/20174625 96% (documents copied)
2022-09-14T17:28:44.143+0800 I -        [repl writer worker 0]   flowtest.processDataObj collection clone progress: 19880883/20174625 98% (documents copied)
2022-09-14T17:29:23.690+0800 I REPL     [repl writer worker 0] CollectionCloner ns:flowtest.processDataObj finished cloning with status: OK
2022-09-14T17:29:39.001+0800 I -        [repl writer worker 0]   Index: (2/3) BTree Bottom Up Progress: 16277900/20174625 80%
,,,【略】
以下是clone usertable:
2022-09-14T17:42:37.954+0800 I -        [repl writer worker 15]   flowtest.usertable collection clone progress: 16896/20000 84% (documents copied)
2022-09-14T17:43:43.274+0800 I -        [repl writer worker 15]   flowtest.usertable collection clone progress: 18432/20000 92% (documents copied)
2022-09-14T17:44:44.984+0800 I -        [repl writer worker 15]   flowtest.usertable collection clone progress: 19968/20000 99% (documents copied)
2022-09-14T17:44:46.189+0800 I REPL     [repl writer worker 15] CollectionCloner ns:flowtest.usertable finished cloning with status: OK
,,,【略】
以下是clone processDataObjInit:
2022-09-14T18:09:28.528+0800 I -        [repl writer worker 9]   flowtest.processDataObjInit collection clone progress: 15248762/16596007 91% (documents copied)
2022-09-14T18:10:30.535+0800 I -        [repl writer worker 9]   flowtest.processDataObjInit collection clone progress: 15863728/16596007 95% (documents copied)
2022-09-14T18:11:39.482+0800 I -        [repl writer worker 9]   flowtest.processDataObjInit collection clone progress: 16459081/16596007 99% (documents copied)
2022-09-14T18:11:54.826+0800 I REPL     [repl writer worker 3] CollectionCloner ns:flowtest.processDataObjInit finished cloning with status: OK
2022-09-14T18:12:11.779+0800 I INDEX    [repl writer worker 3]   done building bottom layer, going to commit
2022-09-14T18:12:26.001+0800 I -        [repl writer worker 3]   Index: (2/3) BTree Bottom Up Progress: 15723000/16596008 94%
,,,【略】
最后日志显示耗时如下:
2022-09-14T18:13:00.122+0800 I REPL     [replication-49] initial sync done; took 5287s.

data目录同步来了328G

注:以上处理整整耗时1天,整理此文耗时2个小时,记录于此,留备参考,感觉有价值,请随手点赞哈。

liking

我是雪人

文章评论