Stop all of services and umount all of volumes
Step1: Stop all of services
/etc/init.d/services.sh stop
/etc/init.d/qsnapman.sh stop
/sbin/daemon_mgr lvmetad stop “/sbin/lvmetad” rm /var/run/lvm/lvmetad.socket
Step2: Confirm which volume group or pool need to repair
pvs # list all of volume groups
lvs -a # list all of volumespools and volume groups
lvs -o+time # can list the created date/time of volumespools lvs -o+thin_id # can list the devices id of volumes
Check Volume Groups using “pvs” command
PV VG Fmt Attr PSize PFree
/dev/drbd1 vg1 lvm2 a– 7.24t 0 # /dev/drbd1 indicates md1(RAID group 1)
/dev/drbd2 vg2 lvm2 a– 7.26t 0 # /dev/drbd2 indicates md1(RAID group 2)
.
.
.
# One Volume Group may include >= 2 RAID groups
Check devices on the volume group/pool which we need to repair using “lvs -a” command
LV | VG | Attr | LSize | Pool Origin | Data% Meta% | Move Log Cpy%Sync | |
Convert | |||||||
lv1 | vg1 | Vwi-aot— | 6.00t | tp1 | 100.00 | # Data volume 1 | |
lv1312 | vg1 | -wi-ao—- | 756.00m | # snapshots pool | |||
lv2 | vg1 | Vwi-aot— 200.00g tp1 | 100.00 | # Data volume 2 | |||
lv544 | vg1 | -wi——- 74.14g | # Reserved to repairing temporary | ||||
snap10001 | vg1 | Vwi-aot— | 6.00t tp1 | lv1 | 100.00 | # Snapshot volume | |
snap10002 | vg1 | Vwi-aot— | 6.00t tp1 | lv1 | 100.00 | # Snapshot volume | |
. | |||||||
. | |||||||
. |
Step3: Assume vg1 need to repair, we need to umount the volumes which belong to the vg1(lv1 will be mounted in /share/CACHEDEV1_DATA….. and so on….)
If there is any volumes or snapshots which is mounted and the mounted volumes belong to the volume group which need to repair, please umount them
# below umount all of data volumes umount /share/CACHEDEV1_DATA
umount /share/CACHEDEV2_DATA
.
.
umount /share/CACHEDEV${#}_DATA
# below umount snapshots umount /dev/mapper/vg1-snap*
if can not umount, lsof /share/CACHEDEV{#}_DATA to check, and “kill -9” these process. Try to umount the volumes again.
Remove all of cache devices and inactivate volume group
Step1: You can list which cache devices on the pool:
ls -l /dev/mapper/ # will list all of devices on the pool
Step2: The result of the above command looks like below(below assume vg1 is the volume group which need to repair)
brw——- | 1 | admin | administrators | 253, | 9 | 2020-02-15 | 10:16 | cachedev1 |
brw——- | 1 | admin | administrators | 253, | 50 | 2020-02-15 | 10:16 | cachedev2 |
crw——- | 1 | admin | administrators | 10, | 236 | 2020-02-15 | 18:14 | control |
brw——- | 1 | admin | administrators | 253, | 7 | 2020-02-15 | 10:16 | vg1-lv1 |
brw——- | 1 | admin | administrators | 253, | 8 | 2020-02-15 | 10:16 | vg1-lv1312 |
brw——- | 1 | admin | administrators | 253, | 10 | 2020-02-15 | 10:16 | vg1-lv2 |
brw——- | 1 | admin | administrators | 253, | 11 | 2020-02-18 | 01:00 | vg1-snap10001 |
brw——- | 1 | admin | administrators | 253, | 13 | 2020-02-19 | 01:00 | vg1-snap10002 |
brw——- | 1 | admin | administrators | 253, | 15 | 2020-02-20 | 01:00 | vg1-snap10003 |
brw——- | 1 | admin | administrators | 253, | 17 | 2020-02-21 | 01:00 | vg1-snap10004 |
brw——- | 1 | admin | administrators | 253, | 19 | 2020-02-22 | 01:00 | vg1-snap10005 |
. | ||||||||
. | ||||||||
. |
Step3: Find cachedev${#} and remove them using dmsetup
dmsetup remove cachedev1 dmsetup remove cachedev2
.
.
.
dmsetup remove cachedev${#}
Step4: Inactivate the volume group which need to repair
lvchange -an vg1 # Assume vg1 is the volume group which need to repair
Step5: Please check again with “ls -l /dev/mapper” command to confirm there is no any block device of vg1. The result should be the below:
crw——- 1 admin administrators 10, 236 2020-02-15 18:14 control
Collect logs of thin pool metadata and backup metadata of the thin pool
Step1: Download collect tools
wget http://download.qnap.com/Storage/tsd/utility/tp_collect.sh
Step2: Execute collect tools to collect logs of the metadata or backup metadata of the pool
sh tp_collect.sh
When the tp_collect.sh start running, please remember enter “pool id”. The pool id is the id of the pool which we want to repair. For examples: vg1/tp1, please input 1vg2/tp2, please input 2……and so on…..
If execute tp_collect fail, you need to let the customer plug one USB external drive(about 100G or more) in the NAS to backup metadata
# assume vg1/tp1 need to collect or repair lvchange -ay vg1/tp1_tmeta
# Change directory to the USB external device cd /share/external/DEVXXXXXX
# backup metadata using dd command
dd if=/dev/mapper/vg1-tp1_tmeta of=tmeta.bin bs=1M
If tp_collect is executed completely or successfully, you can skip the above flow and please backup collect.tgz.
Step 3: Please confirm the backup of metadata is correct
# A. thin check original metadata
pdata_tools_8192(or 4096) thin_check /dev/mapper/vg1-tp1_tmeta
# B. thin check backup metadata
pdata_tools_8192(or 4096) thin_check /mnt/collect/tmeta.bin # If the metadata is backup in the USB external drive, please pdata_tools_8192(or 4096) thin_check /share/external/DEVXXXXXX/tmeta.bin
The result of the above A, B need to be the same. If the A, B is the same, Please be sure to backup the tmeta.bin to another storage or USB
external drive!!! If the vg{#}-tp{#} repaired fail, the vg{#}-tp{#}_tmeta can be restored using tmeta.bin.