In case you're not familiar – there is a thing called LVM – it's a layer between physical disks, and filesystems, and allow certain interesting things, like extending, migrating, snapshotting and others.
At one of systems I've been dealing with, we stumbled upon specific requirement – change LV into striped. It took me a while to figure it out, so I'm writing it down, so I'll never have to research it again.
First some “vocabulary", so everything will be understood:
- PV – Physical volume – generally a disk, or partition on a disk that stores the data
- LV – Logical volume – what OS sees as partition/filesystem, but can be on many physical devices
- VG – Volume group – single pool of PVs and LVs (neither PV nor LV can be in multiple VGs at a time)
Very simple approach can be: you have one disk, you make it a PV, create VG using this PV, and then you create LV within this VG, which you can then format (mkfs.*), mount and use.
Less simple example, can be situation where you have multiple disks, each becomes PV, and then you make on them LV that is as large as sum of all disks – all while making OS see it all as single device.
Of course you could also use mirroring (data is written to multiple PVs at once, to provide safety in case physical disk gets damaged).
Anyway – if you have multiple disks (let's say 3) you can just make single LV on them, with total size of their size summed, but lvm will, by default, use them in sequence – once one of them gets full, data is written to next, and so on.
This means, that at any given time, you have only performance of single disk.
What you can do, is use striping – like RAID 0. And make the same LV, with the same size, but make LVM spread all operations across all PVs – so that you will have better performance (each disk will have to write/read 1/3rd of data).
In our case, we had single PV, and on it single LV. What we didn't take into account was performance. Without going into details – if we'd use 3 (or more) disks, each smaller than the one we originally used, and we'd make striped LV on all of them – it would be faster, for the same money (the server was virtual server in AWS cloud, and disks were EBS volumes).
So, let's see how it was, and how to migrate it.
Initial situation was (originally with much larger disks, but size is irrelevant for test case):
root@test:~# df -x tmpfs Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvda1 8115168 1044624 6635268 14% / udev 1922248 12 1922236 1% /dev /dev/mapper/test--vg-test--lv 8125880 18420 7671648 1% /test
As you can see, I have small (8GB) LV mounted as /test.
I can view it's details:
root@test:~# lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert test-lv test-vg -wi-ao--- 8.00g
This shows that test-lv LV is within test-vg VG, and has size of 8GB.
What about this VG?
root@test:~# vgs VG #PV #LV #SN Attr VSize VFree test-vg 1 1 0 wz--n- 9.00g 1020.00m
This VG has 1 PV, 1 LV, total size of 9.00G, and 1020m free (unused by any LV).
And finally PVs:
root@test:~# pvs PV VG Fmt Attr PSize PFree /dev/xvdf test-vg lvm2 a-- 9.00g 1020.00m
We see that there is /dev/xvdf PV, which belongs to test-vg VG, it's size is 9.00 GB, and there is 1020 MB free.
All fine.
One more information, which will become useful later on:
root@test:~# lvdisplay -m --- Logical volume --- LV Path /dev/test-vg/test-lv LV Name test-lv VG Name test-vg LV UUID deR1rm-UUc5-Lyjy-f2ia-A3T0-zyWF-1G4UvD LV Write Access read/write LV Creation host, time test.depesz.com, 2015-10-08 18:54:03 +0000 LV Status available # open 1 LV Size 8.00 GiB Current LE 2048 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:0 --- Segments --- Logical extent 0 to 2047: Type linear Physical volume /dev/xvdf Physical extents 0 to 2047
Please note the last part: “— Segments —“. Each LV contains certain number of extents – which are like blocks. This particular LV has 2048 extents, numbered from 0 to 2047. And they are mapped linearly to extents 0 to 2047 on /dev/xvdf PV.
Pretty obvious, but this will be helpful later on.
Of course we can also see extent information for PV:
root@test:~# pvdisplay -m --- Physical volume --- PV Name /dev/xvdf VG Name test-vg PV Size 9.00 GiB / not usable 4.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 2303 Free PE 255 Allocated PE 2048 PV UUID kKkYv4-LeiX-UZfm-CtB5-0G8w-z28g-kgL3mv --- Physical Segments --- Physical extent 0 to 2047: Logical volume /dev/test-vg/test-lv Logical extents 0 to 2047 Physical extent 2048 to 2302: FREE
In here, we have “Physical Segments" – where extents 0 to 2047 belong to test-lv LV, and extents 2048 to 2302 are free to use for whatever.
Now – we want to migrate this LV, to 3 disks, but keep the size (we can grow it too, if needed, but that's not the subject of this blogpost).
First thing I have to do – striping works by dividing extents, equally, into all PVs. Since my LV has 2048 extents, it can't be divided equally to 3 PVs, so I'll grow it by 1 extent:
root@test:~# lvextend -l +1 /dev/test-vg/test-lv Extending logical volume test-lv to 8.00 GiB Logical volume test-lv successfully resized
Since I changed the size of underlying disk, it would be good to change ext4 filesystem size too:
root@test:~# df /test/ Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/test--vg-test--lv 8125880 18420 7671648 1% /test root@test:~# resize2fs /dev/mapper/test--vg-test--lv resize2fs 1.42.9 (4-Feb-2014) Filesystem at /dev/mapper/test--vg-test--lv is mounted on /test; on-line resizing required old_desc_blocks = 1, new_desc_blocks = 1 The filesystem on /dev/mapper/test--vg-test--lv is now 2098176 blocks long. root@test:~# df /test/ Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/test--vg-test--lv 8127920 18420 7673528 1% /test
as you can see size of the partition has changed from 8125880 to 8127920 blocks. But at the very least – now we have number of extents that is divisible by 3:
root@test:~# lvdisplay | grep Current\ LE Current LE 2049
Nice. With this fixed, let's add 3 new volumes – /dev/xvdg, /dev/xvdh, and /dev/xvgi. After creation of them, and attaching to AWS instance, I now see them in my system, and can:
root@test:~# pvcreate /dev/xvd{g,h,i} Physical volume "/dev/xvdg" successfully created Physical volume "/dev/xvdh" successfully created Physical volume "/dev/xvdi" successfully created
Since they are now PV's, I need to “attach" them to my test-vg:
root@test:~# vgextend test-vg /dev/xvd{g,h,i} Volume group "test-vg" successfully extended
Now, let's see how it looks:
root@test:~# pvs PV VG Fmt Attr PSize PFree /dev/xvdf test-vg lvm2 a-- 9.00g 1016.00m /dev/xvdg test-vg lvm2 a-- 3.00g 3.00g /dev/xvdh test-vg lvm2 a-- 3.00g 3.00g /dev/xvdi test-vg lvm2 a-- 3.00g 3.00g root@test:~# vgs VG #PV #LV #SN Attr VSize VFree test-vg 4 1 0 wz--n- 17.98g 9.98g
Please note that new pvs are 100% free (each is 3GB). And list of VGS shows that now we have 4 PVs, with total size of 17.98G, and 9.98G free (1020MB on old VG, and 9G on new PVs).
So now I'm ready to actually do the change.
Unfortunately you can't directly change LV into striped. First you have to turn it into mirrored one, and make the mirror striped.
This is done using this command:
root@test:~# lvconvert --mirrors 1 --stripes 3 /dev/test-vg/test-lv Using default stripesize 64.00 KiB test-vg/test-lv: Converted: 0.0% ... test-vg/test-lv: Converted: 100.0%
It's not fast process – for my small test case it took over 4 minutes. But it's irrelevant as it doesn't lock anything – normal filesystem access works all the time.
What this command does, and why? It's simple – it adds mirror (2nd copy) to our LV, which is now (after the lvconvert ended) kept identical to original LV – so it's redundant copy. But this new copy is set as striped to 3 volumes.
How does it look now? df is obviously unchanged, but lvs shows something different:
root@test:~# lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert test-lv test-vg mwi-aom-- 8.00g test-lv_mlog 100.00
please note “m" letter in Attr – this means that this LV is a mirror. We can dig deeper, too:
root@test:~# lvs -a LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert test-lv test-vg mwi-aom-- 8.00g test-lv_mlog 100.00 [test-lv_mimage_0] test-vg iwi-aom-- 8.00g [test-lv_mimage_1] test-vg iwi-aom-- 8.00g [test-lv_mlog] test-vg lwi-aom-- 4.00m
test-lv is a mirror over test-lv_mimage_0 and test-lv_mimage_1 – which are the two sides of the mirror. There is also helper, 4MB test-lv_mlog LV, which is used for internal purposes.
Of course, we don't care about these [test-lv_m* LVs, these are hidden, as we care only about test-lv.
But, let's look at one more thing, lvdisplay with option to show which extent goes where:
root@test:~# lvdisplay -m -a --- Logical volume --- LV Path /dev/test-vg/test-lv --- Segments --- Logical extent 0 to 2048: Type mirror Mirrors 2 Mirror size 2049 Mirror log volume test-lv_mlog Mirror region size 512.00 KiB Mirror original: Logical volume test-lv_mimage_0 Logical extents 0 to 2048 Mirror destinations: Logical volume test-lv_mimage_1 Logical extents 0 to 2048 --- Logical volume --- Internal LV Name test-lv_mlog --- Segments --- Logical extent 0 to 0: Type linear Physical volume /dev/xvdi Physical extents 683 to 683 --- Logical volume --- Internal LV Name test-lv_mimage_0 --- Segments --- Logical extent 0 to 2048: Type linear Physical volume /dev/xvdf Physical extents 0 to 2048 --- Logical volume --- Internal LV Name test-lv_mimage_1 --- Segments --- Logical extent 0 to 2048: Type striped Stripes 3 Stripe size 64.00 KiB Stripe 0: Physical volume /dev/xvdg Physical extents 0 to 682 Stripe 1: Physical volume /dev/xvdh Physical extents 0 to 682 Stripe 2: Physical volume /dev/xvdi Physical extents 0 to 682
(there was more information there, but I removed it as it's not all that important).
Let's see – our “usable" LV is test-lv. According to Segments map for it, there are extents 0 to 2048, which has “Mirror original" – with all extents (0 to 2048) on test-lv_mimage_0, and “Mirror destinations:", with one destination, also all extents, on “test-lv_mimage_1".
So, let's look closer at these internal LVs:
test-lv_mimage_0 is exactly the same as test-lv was before – it's linear mapping to physical extents 0 to 2048 on /dev/xvdf – i.e. original disk. What LVM did, was simply rename old LV into this.
The new, test-lv_mimage_1, LV is more interesting.
We can see, in Segments, that it's striped, has 3 stripes, with stripe size of 64KiB. And the stripes are sent to physical extents 0 to 682 on /dev/xvdg, /dev/xvdh and /dev/xvdi.
This means that this new test-lv_mimage_1 LV is exactly what we wanted – single LV striped across 3 PVs. But it's also part of mirror, which we don't want. So let's make it standalone:
root@test:~# lvconvert --mirrors 0 /dev/test-vg/test-lv /dev/xvdf Logical volume test-lv converted.
This command does the magic. It converts test-lv into mirror-less LV, by removing whatever was on /dev/xvdf. Afterwards, all the hidden, internal, LVS disappeared:
root@test:~# lvs -a LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert test-lv test-vg -wi-ao--- 8.00g
And the one existing is properly striped:
root@test:~# lvdisplay -m --- Logical volume --- LV Path /dev/test-vg/test-lv --- Segments --- Logical extent 0 to 2048: Type striped Stripes 3 Stripe size 64.00 KiB Stripe 0: Physical volume /dev/xvdg Physical extents 0 to 682 Stripe 1: Physical volume /dev/xvdh Physical extents 0 to 682 Stripe 2: Physical volume /dev/xvdi Physical extents 0 to 682
With this done, we should have /dev/xvdf 100% free, so let's verify that:
root@test:~# pvs PV VG Fmt Attr PSize PFree /dev/xvdf test-vg lvm2 a-- 9.00g 9.00g /dev/xvdg test-vg lvm2 a-- 3.00g 336.00m /dev/xvdh test-vg lvm2 a-- 3.00g 336.00m /dev/xvdi test-vg lvm2 a-- 3.00g 336.00m
Nice. Since it's no longer useful, I can remove it from VG, and then detach from instance using AWS Console:
root@test:~# vgreduce test-vg /dev/xvdf Removed "/dev/xvdf" from volume group "test-vg" root@test:~# pvremove /dev/xvdf Labels on physical volume "/dev/xvdf" successfully wiped
Finally, let's see how the VG looks after all the changes:
root@test:~# vgs VG #PV #LV #SN Attr VSize VFree test-vg 3 1 0 wz--n- 8.99g 1008.00m
Not bad. The greatest thing is that all these operations happened live.
To test that striping actually works, I ran:
$ iostat -kx 5 | grep -E 'xvd[ghi]|Device'
and while it was running, I ran:
root@test:/test# time dd if=/dev/zero of=test.file bs=1M count=7500; time sync
Immediately after this started I saw increase in traffic on the disks:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util xvdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 xvdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 xvdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util xvdg 0.00 137.88 0.20 109.98 0.81 14077.39 255.54 75.53 513.49 0.00 514.44 5.21 57.43 xvdh 0.00 137.88 0.00 110.18 0.00 14103.46 256.00 75.95 517.53 0.00 517.53 5.21 57.43 xvdi 0.00 137.88 0.20 114.46 0.81 14650.92 255.56 64.22 442.84 0.00 443.63 5.01 57.43 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util xvdg 0.00 96.90 0.00 97.73 0.00 12509.09 256.00 124.00 1259.75 0.00 1259.75 10.57 103.31 xvdh 0.00 96.90 0.00 98.35 0.00 12549.59 255.21 123.55 1253.64 0.00 1253.64 10.50 103.31 xvdi 0.00 98.35 0.00 97.52 0.00 12449.59 255.32 105.53 1063.20 0.00 1063.20 10.59 103.31
and, as you can see, all 3 disks were used to the same extent – so I got exactly what I wanted. Nice.
Actually it’s spelled “striped”/”striping” (single “p”), as it has stripes (and not stripped of something).
@norbi:
thanks, fixed.
perfect solution for my environment.
works like a charm !!!
thank you
What about change stripe from 2 columns to 3?
@Toro Jose Mario:
not sure. Try. I don’t have such a need, but you apparently do 🙂 Making test case on couple of 1 gb “disks” should be trivial.