Qnap Thin pool repair

February 25, 2021
Research
0 Comments
paris

Stop all of services and umount all of volumes

Step1: Stop all of services

/etc/init.d/services.sh stop

/etc/init.d/qsnapman.sh stop

/sbin/daemon_mgr lvmetad stop “/sbin/lvmetad” rm /var/run/lvm/lvmetad.socket

Step2: Confirm which volume group or pool need to repair

pvs # list all of volume groups

lvs -a # list all of volumespools and volume groups

lvs -o+time # can list the created date/time of volumespools lvs -o+thin_id # can list the devices id of volumes

Check Volume Groups using “pvs” command

PV VG Fmt Attr PSize PFree

/dev/drbd1 vg1 lvm2 a– 7.24t 0 # /dev/drbd1 indicates md1(RAID group 1)

/dev/drbd2 vg2 lvm2 a– 7.26t 0 # /dev/drbd2 indicates md1(RAID group 2)

# One Volume Group may include >= 2 RAID groups

Check devices on the volume group/pool which we need to repair using “lvs -a” command

LV	VG	Attr	LSize	Pool Origin		Data% Meta%	Move Log Cpy%Sync
Convert
lv1	vg1	Vwi-aot—	6.00t	tp1		100.00	# Data volume 1
lv1312	vg1	-wi-ao—-	756.00m	# snapshots pool
lv2	vg1	Vwi-aot— 200.00g tp1			100.00		# Data volume 2
lv544	vg1	-wi——- 74.14g			# Reserved to repairing temporary
snap10001	vg1	Vwi-aot—	6.00t tp1		lv1	100.00	# Snapshot volume
snap10002	vg1	Vwi-aot—	6.00t tp1		lv1	100.00	# Snapshot volume
.
.
.

Step3: Assume vg1 need to repair, we need to umount the volumes which belong to the vg1(lv1 will be mounted in /share/CACHEDEV1_DATA….. and so on….)

If there is any volumes or snapshots which is mounted and the mounted volumes belong to the volume group which need to repair, please umount them

# below umount all of data volumes umount /share/CACHEDEV1_DATA

umount /share/CACHEDEV2_DATA

umount /share/CACHEDEV${#}_DATA

# below umount snapshots umount /dev/mapper/vg1-snap*

if can not umount, lsof /share/CACHEDEV{#}_DATA to check, and “kill -9” these process. Try to umount the volumes again.

Remove all of cache devices and inactivate volume group

Step1: You can list which cache devices on the pool:

ls -l /dev/mapper/ # will list all of devices on the pool

Step2: The result of the above command looks like below(below assume vg1 is the volume group which need to repair)

brw——-	1	admin	administrators	253,	9	2020-02-15	10:16	cachedev1
brw——-	1	admin	administrators	253,	50	2020-02-15	10:16	cachedev2
crw——-	1	admin	administrators	10,	236	2020-02-15	18:14	control
brw——-	1	admin	administrators	253,	7	2020-02-15	10:16	vg1-lv1
brw——-	1	admin	administrators	253,	8	2020-02-15	10:16	vg1-lv1312
brw——-	1	admin	administrators	253,	10	2020-02-15	10:16	vg1-lv2
brw——-	1	admin	administrators	253,	11	2020-02-18	01:00	vg1-snap10001
brw——-	1	admin	administrators	253,	13	2020-02-19	01:00	vg1-snap10002
brw——-	1	admin	administrators	253,	15	2020-02-20	01:00	vg1-snap10003
brw——-	1	admin	administrators	253,	17	2020-02-21	01:00	vg1-snap10004
brw——-	1	admin	administrators	253,	19	2020-02-22	01:00	vg1-snap10005
.
.
.

Step3: Find cachedev${#} and remove them using dmsetup

dmsetup remove cachedev1 dmsetup remove cachedev2

dmsetup remove cachedev${#}

Step4: Inactivate the volume group which need to repair

lvchange -an vg1 # Assume vg1 is the volume group which need to repair

Step5: Please check again with “ls -l /dev/mapper” command to confirm there is no any block device of vg1. The result should be the below:

crw——- 1 admin administrators 10, 236 2020-02-15 18:14 control

Collect logs of thin pool metadata and backup metadata of the thin pool

Step1: Download collect tools

wget http://download.qnap.com/Storage/tsd/utility/tp_collect.sh

Step2: Execute collect tools to collect logs of the metadata or backup metadata of the pool

sh tp_collect.sh

When the tp_collect.sh start running, please remember enter “pool id”. The pool id is the id of the pool which we want to repair. For examples: vg1/tp1, please input 1vg2/tp2, please input 2……and so on…..

If execute tp_collect fail, you need to let the customer plug one USB external drive(about 100G or more) in the NAS to backup metadata

# assume vg1/tp1 need to collect or repair lvchange -ay vg1/tp1_tmeta

# Change directory to the USB external device cd /share/external/DEVXXXXXX

# backup metadata using dd command

dd if=/dev/mapper/vg1-tp1_tmeta of=tmeta.bin bs=1M

If tp_collect is executed completely or successfully, you can skip the above flow and please backup collect.tgz.

Step 3: Please confirm the backup of metadata is correct

# A. thin check original metadata

pdata_tools_8192(or 4096) thin_check /dev/mapper/vg1-tp1_tmeta

# B. thin check backup metadata

pdata_tools_8192(or 4096) thin_check /mnt/collect/tmeta.bin # If the metadata is backup in the USB external drive, please pdata_tools_8192(or 4096) thin_check /share/external/DEVXXXXXX/tmeta.bin

The result of the above A, B need to be the same. If the A, B is the same, Please be sure to backup the tmeta.bin to another storage or USB

external drive!!! If the vg{#}-tp{#} repaired fail, the vg{#}-tp{#}_tmeta can be restored using tmeta.bin.

(No Ratings Yet)