Introduction

In practice, it is easier to construct hashing algorithms which operate on relatively small, fixed input lengths, whilst still keeping the output length even smaller ( is still less than ). But hash functions are usually used on much larger inputs - for example, creating checksums for integrity verification of files. The Merkle-Damgård transform allows us to turn such a hash function, which operates on small fixed input lengths, into a hash function which operates on inputs of arbitrary lengths.

The Merkle-Damgård Construction

In particular, given a compression function which works with inputs of a “small”, fixed input length and has outputs with length , the Merkle-Damgård transform allows us to use to a construct a hash function which takes messages of arbitrary length, denoted by , and produces digests of the same output length as .

The construction is similar to a block cipher in the sense that the message is chopped up into blocks. In contrast to block ciphers, however, this is done rather differently. Each block has length (since each block will be input into ), but it is not comprised entirely of message bits. Instead, each block contains message bits and the other bits () represent the so-called chaining variable for the current block.

This means that the message needs to be chopped up into message fragments , all of length . If the message length is not a multiple of , then the message is padded by appending a 1 to it and then appending 0s until the message length is short of a multiple of the fragment length by exactly the number of bits needed to encode the message length . The total length of the padding (including the 1, the 0s and the encoding of the message length) is denoted by .

When the message length is a multiple of the fragment length , padding still needs to be added. In a particular, an additional padding block is appended to the message, following the exact same procedure as before. The padding block begins with a and is followed by s - the last bits of the padding block again encode the message length .

NOTE

The number of bits reserved for encoding the message length is fixed for a given Merkle-Damgård construction. Usually is 64 bits, resulting in a maximum message length of , which is quite a reasonable limit.

After padding, the actual hash algorithm begins by appending an initialisation vector (IV) of length to the first message fragment . The IV is always a constant which is pre-defined in the specification of the Merkle-Damgård construction.

This initialisation vector serves as the initial chaining variable . The concatenation of the first message block with the IV is passed to the compression function , whose output becomes the next chaining variable. In general, the -th iteration takes the -th message block and appends to it the chaining variable . The chaining variable for the current stage is simply the output of from the previous iteration. The final output, i.e. the hash generated by the Merkle-Damgård function , is the final chaining variable.

Security of Merkle-Damgård Constructions

The reason why the Merkle-Damgård transform is used ubiquitously is the fact that it preserves collision resistance.

Theorem: Merkle-Damgård Collision Resistance

If the compression function is collision resistant, then so is the Merkle-Damgård function .