之前文章已详细描述了PBM的安装和入门使用,结论是可大幅提升大数据量的备份效率,本文结合一线 MongoDB 集群的实际部署现状,详细记录使用 PBM 进行物理备份和恢复的操作步骤,供一线直接参考使用。
一、Make a physical backup
配置好 pbm 以后,物理备份非常简单:
pbm backup --type=physical
二、Restore from a physical backup
1、恢复之前首先确认当前的备份集:
Backup snapshots:
2025-12-10T08:22:10Z <physical> [restore_to_time: 2025-12-10T08:22:12Z]
2025-12-10T08:33:06Z <logical> [restore_to_time: 2025-12-10T08:53:18Z]
2025-12-12T06:43:30Z <physical> [restore_to_time: 2025-12-12T06:43:32Z]
2、分片集群恢复
注意:恢复操作会自动清空当前的数据,切记谨慎操作!
1). 停止 balancer
mongos>
sh.stopBalancer()
2). 关闭所有 mongos,阻止客户端访问
killall mongos
3). 禁用 PITR
pbm config --set pitr.enabled=false
4). 全集恢复
pbm restore "2025-12-12T06:43:30Z"
[mongod@node1:1 /u01/nfs/ynzybak_mongodb_pbm]$ pbm restore "2025-12-12T06:43:30Z"
Starting restore 2025-12-12T07:17:14.062699954Z from '2025-12-12T06:43:30Z'...............................................Restore of the snapshot from '2025-12-12T06:43:30Z' has started.
Check restore status with: pbm describe-restore 2025-12-12T07:17:14.062699954Z -c </path/to/pbm.conf.yaml>
No other pbm command is available while the restore is running!
以上输出可见,恢复期间切勿执行其他的 pbm 操作,以免导致意外!
并且,恢复期间仅可以通过以下方式检查恢复状态:
pbm describe-restore 2025-12-12T07:17:14.062699954Z -c /u01/mongodb/conf/pbm_config.yaml
[mongod@node1:1 /u01/nfs/ynzybak_mongodb_pbm]$ pbm describe-restore 2025-12-12T07:17:14.062699954Z -c /u01/mongodb/conf/pbm_config.yaml
name: "2025-12-12T07:17:14.062699954Z"
opid: ""
backup: ""
type: physical
status: running
last_transition_time: "2025-12-12T07:18:00Z"
replsets:
- name: configs
status: done
last_transition_time: "2025-12-12T07:22:29Z"
nodes:
- name: node1:21000
status: done
last_transition_time: "2025-12-12T07:22:24Z"
- name: node2:21000
status: done
last_transition_time: "2025-12-12T07:22:24Z"
- name: node3:21000
status: done
last_transition_time: "2025-12-12T07:22:23Z"
- name: shard1
status: down
last_transition_time: "2025-12-12T07:19:06Z"
nodes:
- name: node1:27001
status: done
last_transition_time: "2025-12-12T07:22:23Z"
- name: node2:27001
status: running
last_transition_time: "2025-12-12T07:17:49Z"
- name: node3:27001
status: running
last_transition_time: "2025-12-12T07:17:49Z"
以上输出可见,部分节点的状态是 running ,表明正在恢复中。
最后恢复完毕之后的状态如下:
[mongod@node1:1 /u01/nfs/ynzybak_mongodb_pbm]$ pbm describe-restore 2025-12-12T07:17:14.062699954Z -c /u01/mongodb/conf/pbm_config.yaml
name: "2025-12-12T07:17:14.062699954Z"
opid: 693bc17a3c10278f23378457
backup: "2025-12-12T06:43:30Z"
type: physical
status: done
last_transition_time: "2025-12-12T07:23:08Z"
replsets:
- name: configs
status: done
last_transition_time: "2025-12-12T07:22:29Z"
nodes:
- name: node1:21000
status: done
last_transition_time: "2025-12-12T07:22:24Z"
- name: node2:21000
status: done
last_transition_time: "2025-12-12T07:22:24Z"
- name: node3:21000
status: done
last_transition_time: "2025-12-12T07:22:23Z"
- name: shard1
status: done
last_transition_time: "2025-12-12T07:23:03Z"
nodes:
- name: node1:27001
status: done
last_transition_time: "2025-12-12T07:22:23Z"
- name: node2:27001
status: done
last_transition_time: "2025-12-12T07:22:39Z"
- name: node3:27001
status: done
last_transition_time: "2025-12-12T07:22:32Z"
可见,各个副本集、节点的状态均为 done,表明恢复完成。
4). PITR 恢复
基于时间点的恢复如下操作:
pbm restore --time="2023-02-22T08:30:00"
本次暂未执行,掠过。
5). Post-restore steps
恢复完成后,执行如下步骤。
---Remove the contents of the datadir on any arbiter nodes
当前一线集群架构未采用仲裁节点,所以不涉及
---Restart all mongod nodes
mongod -f ${MongoDir}/conf/config.conf
mongod -f ${MongoDir}/conf/shard1.conf
---Restart all pbm-agents
pbm-agent --mongodb-uri "mongodb://pbmuser:passwd@localhost:21000/?authSource=admin" > /u01/mongodb/pbm-log/pbm-agent.$(hostname -s).21000.log 2>&1 &
pbm-agent --mongodb-uri "mongodb://pbmuser:passwd@localhost:27001/?authSource=admin" > /u01/mongodb/pbm-log/pbm-agent.$(hostname -s).27001.log 2>&1 &
---Run the following command to resync the backup list with the storage:
更新备份集信息
pbm config --force-resync -w
---Start the balancer and start mongos nodes.
mongos -f ${MongoDir}/conf/mongos.conf
sh.startBalancer()
---启用 PITR
pbm config --set pitr.enabled=true
文章评论