LVM Snapshots

Introduction

LVM, the Linux Logical Volume Manager, allows taking so-called snapshots of logical volumes (LVs). A snapshot has the same behavior as an independent copy of the original volume; however, the snapshot only stores the changes compared to the original volume, so it typically needs considerably less disk space. Actually, the size of the snapshot volume can be arbitrarily chosen (independent of the size of the original volume). However, when the disk space of the snapshot is not sufficient to store all changes, the snapshot becomes invalid. To be on the safe side, the snapshot should have the same size as the original volume.

Technically, the snapshot is a copy-on-write (COW) table. A change to the snapshot is stored in this table, and the original volume remains unchanged. On the other hand, when the original volume is changed, the previous data is copied into the copy-on-write table, so that the change to the original volume is not visible in the snapshot. If the same data in the snapshot has already been changed earlier (i.e. there is already an entry in the COW table), then this change is obviously not overwritten.

The device mapper

The LVM functionality is mostly handled by the device mapper. It can provide virtual block devices and redirects any access to these virtual devices to another low-level device. The device mapper has different targets (i.e. kernel modules) that are responsible for the actual mapping.

A simple LV is typically implemented by the target linear. This maps a continuous section of the virtual device to a likewise continuous section of the low-level device. Example:

test:~ # lvcreate -l 2 -n base vg0
 Logical volume "base" created
test:~ # dir /dev/vg0
total 0
lrwxrwxrwx 1 root root 7 Jul 4 09:33 base -> ../dm-0
test:~ # dir /dev/mapper
total 0
crw------- 1 root root 10, 236 Jul 4 09:28 control
lrwxrwxrwx 1 root root 7 Jul 4 09:33 vg0-base -> ../dm-0
test:~ # dmsetup table vg0-base
0 16384 linear 202:4 2048

The device 202,4 is the physical volume /dev/xvda4 that was used to create the volume group vg0:

test:~ # dir /dev/xvda*
brw-rw---- 1 root disk 202, 0 Jul 7 16:16 /dev/xvda
brw-rw---- 1 root disk 202, 1 Jul 7 16:16 /dev/xvda1
brw-rw---- 1 root disk 202, 2 Jul 7 16:16 /dev/xvda2
brw-rw---- 1 root disk 202, 3 Jul 7 16:16 /dev/xvda3
brw-rw---- 1 root disk 202, 4 Jul 7 16:16 /dev/xvda4

Preparation of the original volume

For the following tests, the LV is filled with well-defined content, e.g. with the text „base“. Since the volume group has a physical extent size of 4 MB and the LV was created with a size of 2 extents, the size is 8 MB:

test:~ # lvdisplay /dev/vg0/base | grep 'LV Size'
 LV Size 8.00 MiB

To fill the logical volume, the command dd is used:

test:~ # for i in $(seq 1 2097152); do 
> echo -n base
> done | dd of=/dev/vg0/base bs=4
 2097152+0 records in
 2097152+0 records out
 8388608 bytes (8.4 MB) copied, 56.5065 s, 148 kB/s

In order to check that the data has been correctly written, dd can be used again:

test:~ # dd if=/dev/vg0/base bs=1 count=32
 basebasebasebasebasebasebasebase

Creating the snapshot

We now create the snapshot with a size of 1 extent (4 MB):

test:~ # lvcreate -s /dev/vg0/base -n snap -l 1
Logical volume "snap" created

The directory /dev/vg0 now contains the snapshot as a separate LV. Of course, the original volume is still there, too:

test:~ # dir /dev/vg0
 total 0
 lrwxrwxrwx 1 root root 7 Jul 4 09:35 base -> ../dm-0
 lrwxrwxrwx 1 root root 7 Jul 4 09:35 snap -> ../dm-1

It is much more interesting to look at the directory /dev/mapper. Besides the two externally visible devices (which are also present in/dev/vg0), it contains two more devices, namely vg0-base-real and vg0-snap-cow:

test:~ # dir /dev/mapper
 total 0
 crw------- 1 root root 10, 236 Jul 4 09:28 control
 lrwxrwxrwx 1 root root 7 Jul 4 09:35 vg0-base -> ../dm-0
 lrwxrwxrwx 1 root root 7 Jul 4 09:35 vg0-snap -> ../dm-1
 lrwxrwxrwx 1 root root 7 Jul 4 09:35 vg0-snap-cow -> ../dm-3
 lrwxrwxrwx 1 root root 7 Jul 4 09:35 vg0-base-real -> ../dm-2

It is also interesting to look at the output of dmsetup for these devices:

test:~ # dmsetup table vg0-base
0 16384 snapshot-origin 253:2
test:~ # dmsetup table vg0-base-real
0 16384 linear 202:4 2048
test:~ # dmsetup table vg0-snap
0 16384 snapshot 253:2 253:3 P 8
test:~ # dmsetup table vg0-snap-cow
0 8192 linear 202:4 18432

It becomes visible that the snapshot created another layer of mapping. The existing device for the original LV (vg0-base) is no longer of type linear, but instead of type snapshot-origin. It refers to the device vg0-base-real. The new device vg0-snap is of type snapshot. It refers to the device vg0-base-real, too, and also to vg0-snap-cow. These two devices behave like „normal“ LVs, i.e. they refer to areas of the physical volume.

lvm-snapshots

When we look at the sizes of the 4 block devices, we find that the two devices vg0-base and vg0-base-real both have a size of 8 MB each. vg0-base-real is the original volume that was initially created, and vg0-base is just a mapping overlay. The device vg0-snap also has a (virtual) size of 8 MB because a snapshot always has the same size as the original volume. On the other hand, the COW table vg0-snap-cow has only a size of 4 MB, which is exactly the size that was specified when the snapshot was created.

test:~ # blockdev --getsize64 /dev/mapper/vg0-base /dev/mapper/vg0-snap /dev/mapper/vg0-snap-cow /dev/mapper/vg0-base-real
 8388608
 8388608
 4194304
 8388608

Changes to the snapshot

What happens when data is written to the snapshot?

test:~ # echo -n snapshot | dd of=/dev/vg0/snap bs=1
8+0 records in
8+0 records out
8 bytes (8 B) copied, 0.0341143 s, 0.2 kB/s

test:~ # dd if=/dev/mapper/vg0-snap bs=1 count=32
snapshotbasebasebasebasebasebase

test:~ # dd if=/dev/mapper/vg0-base bs=1 count=32
basebasebasebasebasebasebasebase

As expected, the data in the snapshot changed while the original volume remained unchanged. It is interesting to take a closer look at the underlying devices:

test:~ # dd if=/dev/mapper/vg0-base-real bs=1 count=32
basebasebasebasebasebasebasebase
test:~ # dd if=/dev/mapper/vg0-snap-cow | hexdump -C
00000000  53 6e 41 70 01 00 00 00  01 00 00 00 08 00 00 00  |SnAp............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  00 00 00 00 00 00 00 00  02 00 00 00 00 00 00 00  |................|
00001010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002000  73 6e 61 70 73 68 6f 74  62 61 73 65 62 61 73 65  |snapshotbasebase|
00002010  62 61 73 65 62 61 73 65  62 61 73 65 62 61 73 65  |basebasebasebase|
*
00003000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00400000

The changed data has been written to the COW table, while the original LV remained unchanged. It should be noted that, even though only 4 bytes have been changed, the block that has been written to the COW table is 4096 bytes large. The size of the blocks that are written for each change is the snapshot chunk size that can be specified when the snapshot is created. Because the COW table is filled in chunks, it can be filled up with changes scattered over the snapshot, so that the snapshot becomes invalid although the total amount of changed data is much smaller than the snapshot size.

Changes to the original volume

What happens when the original volume is changed?

test:~ # echo -n test | dd of=/dev/vg0/base bs=1 seek=4096
4+0 records in
4+0 records out
4 bytes (4 B) copied, 0.0174384 s, 0.2 kB/s

test:~ # dd if=/dev/mapper/vg0-base bs=1 count=32 skip=4096
testbasebasebasebasebasebasebase

test:~ # dd if=/dev/mapper/vg0-base-real bs=1 count=32 skip=4096
testbasebasebasebasebasebasebase

test:~ # dd if=/dev/mapper/vg0-snap-cow | hexdump -C
00000000 53 6e 41 70 01 00 00 00 01 00 00 00 08 00 00 00 |SnAp............|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 |................|
00001010 01 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 |................|
00001020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00002000 73 6e 61 70 73 68 6f 74 62 61 73 65 62 61 73 65 |snapshotbasebase|
00002010 62 61 73 65 62 61 73 65 62 61 73 65 62 61 73 65 |basebasebasebase|
*
00004000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00400000

It does not surprise that the devices vg0-base and vg0-base-real show changes. More interesting is the COW table: At a first glance, it looks as if it had not changed at all. In fact, another 4k block has been added, into which the original data („…basebasebasebase…“) has been copied. Hence, the change is not visible in the snapshot.

Sources / Links

Schreibe einen Kommentar