|  | == General == | 
|  |  | 
|  | A qcow2 image file is organized in units of constant size, which are called | 
|  | (host) clusters. A cluster is the unit in which all allocations are done, | 
|  | both for actual guest data and for image metadata. | 
|  |  | 
|  | Likewise, the virtual disk as seen by the guest is divided into (guest) | 
|  | clusters of the same size. | 
|  |  | 
|  | All numbers in qcow2 are stored in Big Endian byte order. | 
|  |  | 
|  |  | 
|  | == Header == | 
|  |  | 
|  | The first cluster of a qcow2 image contains the file header: | 
|  |  | 
|  | Byte  0 -  3:   magic | 
|  | QCOW magic string ("QFI\xfb") | 
|  |  | 
|  | 4 -  7:   version | 
|  | Version number (only valid value is 2) | 
|  |  | 
|  | 8 - 15:   backing_file_offset | 
|  | Offset into the image file at which the backing file name | 
|  | is stored (NB: The string is not null terminated). 0 if the | 
|  | image doesn't have a backing file. | 
|  |  | 
|  | 16 - 19:   backing_file_size | 
|  | Length of the backing file name in bytes. Must not be | 
|  | longer than 1023 bytes. Undefined if the image doesn't have | 
|  | a backing file. | 
|  |  | 
|  | 20 - 23:   cluster_bits | 
|  | Number of bits that are used for addressing an offset | 
|  | within a cluster (1 << cluster_bits is the cluster size). | 
|  | Must not be less than 9 (i.e. 512 byte clusters). | 
|  |  | 
|  | Note: qemu as of today has an implementation limit of 2 MB | 
|  | as the maximum cluster size and won't be able to open images | 
|  | with larger cluster sizes. | 
|  |  | 
|  | 24 - 31:   size | 
|  | Virtual disk size in bytes | 
|  |  | 
|  | 32 - 35:   crypt_method | 
|  | 0 for no encryption | 
|  | 1 for AES encryption | 
|  |  | 
|  | 36 - 39:   l1_size | 
|  | Number of entries in the active L1 table | 
|  |  | 
|  | 40 - 47:   l1_table_offset | 
|  | Offset into the image file at which the active L1 table | 
|  | starts. Must be aligned to a cluster boundary. | 
|  |  | 
|  | 48 - 55:   refcount_table_offset | 
|  | Offset into the image file at which the refcount table | 
|  | starts. Must be aligned to a cluster boundary. | 
|  |  | 
|  | 56 - 59:   refcount_table_clusters | 
|  | Number of clusters that the refcount table occupies | 
|  |  | 
|  | 60 - 63:   nb_snapshots | 
|  | Number of snapshots contained in the image | 
|  |  | 
|  | 64 - 71:   snapshots_offset | 
|  | Offset into the image file at which the snapshot table | 
|  | starts. Must be aligned to a cluster boundary. | 
|  |  | 
|  | Directly after the image header, optional sections called header extensions can | 
|  | be stored. Each extension has a structure like the following: | 
|  |  | 
|  | Byte  0 -  3:   Header extension type: | 
|  | 0x00000000 - End of the header extension area | 
|  | 0xE2792ACA - Backing file format name | 
|  | other      - Unknown header extension, can be safely | 
|  | ignored | 
|  |  | 
|  | 4 -  7:   Length of the header extension data | 
|  |  | 
|  | 8 -  n:   Header extension data | 
|  |  | 
|  | n -  m:   Padding to round up the header extension size to the next | 
|  | multiple of 8. | 
|  |  | 
|  | The remaining space between the end of the header extension area and the end of | 
|  | the first cluster can be used for other data. Usually, the backing file name is | 
|  | stored there. | 
|  |  | 
|  |  | 
|  | == Host cluster management == | 
|  |  | 
|  | qcow2 manages the allocation of host clusters by maintaining a reference count | 
|  | for each host cluster. A refcount of 0 means that the cluster is free, 1 means | 
|  | that it is used, and >= 2 means that it is used and any write access must | 
|  | perform a COW (copy on write) operation. | 
|  |  | 
|  | The refcounts are managed in a two-level table. The first level is called | 
|  | refcount table and has a variable size (which is stored in the header). The | 
|  | refcount table can cover multiple clusters, however it needs to be contiguous | 
|  | in the image file. | 
|  |  | 
|  | It contains pointers to the second level structures which are called refcount | 
|  | blocks and are exactly one cluster in size. | 
|  |  | 
|  | Given a offset into the image file, the refcount of its cluster can be obtained | 
|  | as follows: | 
|  |  | 
|  | refcount_block_entries = (cluster_size / sizeof(uint16_t)) | 
|  |  | 
|  | refcount_block_index = (offset / cluster_size) % refcount_block_entries | 
|  | refcount_table_index = (offset / cluster_size) / refcount_block_entries | 
|  |  | 
|  | refcount_block = load_cluster(refcount_table[refcount_table_index]); | 
|  | return refcount_block[refcount_block_index]; | 
|  |  | 
|  | Refcount table entry: | 
|  |  | 
|  | Bit  0 -  8:    Reserved (set to 0) | 
|  |  | 
|  | 9 - 63:    Bits 9-63 of the offset into the image file at which the | 
|  | refcount block starts. Must be aligned to a cluster | 
|  | boundary. | 
|  |  | 
|  | If this is 0, the corresponding refcount block has not yet | 
|  | been allocated. All refcounts managed by this refcount block | 
|  | are 0. | 
|  |  | 
|  | Refcount block entry: | 
|  |  | 
|  | Bit  0 - 15:    Reference count of the cluster | 
|  |  | 
|  |  | 
|  | == Cluster mapping == | 
|  |  | 
|  | Just as for refcounts, qcow2 uses a two-level structure for the mapping of | 
|  | guest clusters to host clusters. They are called L1 and L2 table. | 
|  |  | 
|  | The L1 table has a variable size (stored in the header) and may use multiple | 
|  | clusters, however it must be contiguous in the image file. L2 tables are | 
|  | exactly one cluster in size. | 
|  |  | 
|  | Given a offset into the virtual disk, the offset into the image file can be | 
|  | obtained as follows: | 
|  |  | 
|  | l2_entries = (cluster_size / sizeof(uint64_t)) | 
|  |  | 
|  | l2_index = (offset / cluster_size) % l2_entries | 
|  | l1_index = (offset / cluster_size) / l2_entries | 
|  |  | 
|  | l2_table = load_cluster(l1_table[l1_index]); | 
|  | cluster_offset = l2_table[l2_index]; | 
|  |  | 
|  | return cluster_offset + (offset % cluster_size) | 
|  |  | 
|  | L1 table entry: | 
|  |  | 
|  | Bit  0 -  8:    Reserved (set to 0) | 
|  |  | 
|  | 9 - 55:    Bits 9-55 of the offset into the image file at which the L2 | 
|  | table starts. Must be aligned to a cluster boundary. If the | 
|  | offset is 0, the L2 table and all clusters described by this | 
|  | L2 table are unallocated. | 
|  |  | 
|  | 56 - 62:    Reserved (set to 0) | 
|  |  | 
|  | 63:    0 for an L2 table that is unused or requires COW, 1 if its | 
|  | refcount is exactly one. This information is only accurate | 
|  | in the active L1 table. | 
|  |  | 
|  | L2 table entry (for normal clusters): | 
|  |  | 
|  | Bit  0 -  8:    Reserved (set to 0) | 
|  |  | 
|  | 9 - 55:    Bits 9-55 of host cluster offset. Must be aligned to a | 
|  | cluster boundary. If the offset is 0, the cluster is | 
|  | unallocated. | 
|  |  | 
|  | 56 - 61:    Reserved (set to 0) | 
|  |  | 
|  | 62:    0 (this cluster is not compressed) | 
|  |  | 
|  | 63:    0 for a cluster that is unused or requires COW, 1 if its | 
|  | refcount is exactly one. This information is only accurate | 
|  | in L2 tables that are reachable from the the active L1 | 
|  | table. | 
|  |  | 
|  | L2 table entry (for compressed clusters; x = 62 - (cluster_size - 8)): | 
|  |  | 
|  | Bit  0 -  x:    Host cluster offset. This is usually _not_ aligned to a | 
|  | cluster boundary! | 
|  |  | 
|  | x+1 - 61:    Compressed size of the images in sectors of 512 bytes | 
|  |  | 
|  | 62:    1 (this cluster is compressed using zlib) | 
|  |  | 
|  | 63:    0 for a cluster that is unused or requires COW, 1 if its | 
|  | refcount is exactly one. This information is only accurate | 
|  | in L2 tables that are reachable from the the active L1 | 
|  | table. | 
|  |  | 
|  | If a cluster is unallocated, read requests shall read the data from the backing | 
|  | file. If there is no backing file or the backing file is smaller than the image, | 
|  | they shall read zeros for all parts that are not covered by the backing file. | 
|  |  | 
|  |  | 
|  | == Snapshots == | 
|  |  | 
|  | qcow2 supports internal snapshots. Their basic principle of operation is to | 
|  | switch the active L1 table, so that a different set of host clusters are | 
|  | exposed to the guest. | 
|  |  | 
|  | When creating a snapshot, the L1 table should be copied and the refcount of all | 
|  | L2 tables and clusters reachable from this L1 table must be increased, so that | 
|  | a write causes a COW and isn't visible in other snapshots. | 
|  |  | 
|  | When loading a snapshot, bit 63 of all entries in the new active L1 table and | 
|  | all L2 tables referenced by it must be reconstructed from the refcount table | 
|  | as it doesn't need to be accurate in inactive L1 tables. | 
|  |  | 
|  | A directory of all snapshots is stored in the snapshot table, a contiguous area | 
|  | in the image file, whose starting offset and length are given by the header | 
|  | fields snapshots_offset and nb_snapshots. The entries of the snapshot table | 
|  | have variable length, depending on the length of ID, name and extra data. | 
|  |  | 
|  | Snapshot table entry: | 
|  |  | 
|  | Byte 0 -  7:    Offset into the image file at which the L1 table for the | 
|  | snapshot starts. Must be aligned to a cluster boundary. | 
|  |  | 
|  | 8 - 11:    Number of entries in the L1 table of the snapshots | 
|  |  | 
|  | 12 - 13:    Length of the unique ID string describing the snapshot | 
|  |  | 
|  | 14 - 15:    Length of the name of the snapshot | 
|  |  | 
|  | 16 - 19:    Time at which the snapshot was taken in seconds since the | 
|  | Epoch | 
|  |  | 
|  | 20 - 23:    Subsecond part of the time at which the snapshot was taken | 
|  | in nanoseconds | 
|  |  | 
|  | 24 - 31:    Time that the guest was running until the snapshot was | 
|  | taken in nanoseconds | 
|  |  | 
|  | 32 - 35:    Size of the VM state in bytes. 0 if no VM state is saved. | 
|  | If there is VM state, it starts at the first cluster | 
|  | described by first L1 table entry that doesn't describe a | 
|  | regular guest cluster (i.e. VM state is stored like guest | 
|  | disk content, except that it is stored at offsets that are | 
|  | larger than the virtual disk presented to the guest) | 
|  |  | 
|  | 36 - 39:    Size of extra data in the table entry (used for future | 
|  | extensions of the format) | 
|  |  | 
|  | variable:   Extra data for future extensions. Must be ignored. | 
|  |  | 
|  | variable:   Unique ID string for the snapshot (not null terminated) | 
|  |  | 
|  | variable:   Name of the snapshot (not null terminated) |