Previous Page
Next Page

3.4. Cryptographic Hash Functions, MACs, and HMACs

One problem in implementing VPNs is message authentication. That is, how can we be sure that the message is from whom it says it is, and that it hasn't been tampered with in transmission? An everyday example of this is an email message signed with PGP, GPG, or a similar utility. The recipient of such a message can be sure that the sender is legitimate and that the message has not been altered by a third party.

In this section, we discuss some of the tools that provide these capabilities. We are not interested in utilities, such as GPG, but rather in the building blocks that are used to construct them. As we shall see, these tools have applications far beyond signing email messages.

One of the basic building blocks that we will need is a cryptographic hash function. These functions take an input of arbitrary length and produce a fixed-sized digest, or "fingerprint," of the input. A trivial example of a hash function on a message m is h(m) = m mod n for some n. Although such a hash function is useful for table-lookup applications, it is not a cryptographic hash function, because it lacks the essential properties required of a cryptographic hash function, h.

  • Given x, it is computationally infeasible to compute m such that x = h(m).

  • Given m1, it is computationally infeasible to compute m2 such that h(m1) = h(m2).

  • It is computationally infeasible to find m1 and m2 such that h(m1) = h(m2).

This last property is called strong collision resistance. It may seem superfluous, but hash functions lacking this property are vulnerable to various attacks based on the birthday paradox. See Section 18.1 of [Schneier 1996] for a description of one such attack.

The birthday paradox is the surprising result that if 23 people are in a room, the chances are better than even that 2 of them will have the same birthday.

An examination of the properties that hash functions should have and the place of strong collision resistance in those properties is presented in [Anderson 1993].

Although there are several cryptographic hash functions, we examine only two: MD5 and SHA. Both of these functions take a message of any length and produce a fixed-sized result: 128 bits for MD5 and 160 bits for the standard SHA algorithm, called SHA-1.

MD5

The MD5 algorithm was developed by Ron Rivest in 1992 and released to the public domain. Its specification and a reference implementation are given in RFC 1321 [Rivest 1992b]. Before describing the algorithm, let's see it in action. We use the FreeBSD md5 utility to compute the MD5 digest of the two strings 1 and 3:

$ md5 -s 1
MD5 ("1") = c4ca4238a0b923820dcc509a6f75849b
$ md5 -s 3
MD5 ("3") = eccbc87e4b5ce2fe28308fd9f2a7baf3

Two points are worth noticing here. First, even though the input strings are a single byte, they produce a 128-bit digest. Second, although the strings 1 and 3 vary in only a single bit, they produce two radically different digests. This is an example of the avalanche effect, where a change in one bit affects several bits in the result.

Given a message m of length lm, MD5 first pads the message with a 1-bit and then as many 0-bits as required to make lm = 448 mod 512. Then the least significant 64 bits of the message length are appended to make the padded message a multiple of 512 bits. Because the 1-bit is always appended, the message is always padded, even if it is already a multiple of 512 bits.

The MD5 algorithm operates on one 512-bit block at a time until the entire message is processed. During this process, MD5 maintains 16 bytes of state in four 32-bit registers (A, B, C,D). This state is modified by mixing it with data from the input block in a nonlinear way. After the last block has been processed, the 16 bytes of state are concatenated to form the digest. Before processing the first block, the state is initialized to (0x01234567, 0x89abcdef, 0xfedcda98, 0x76543210).

Figure 3.9 shows an overview of the MD5 process for one block of data. After the four rounds of mixing, the modified state (A', B', C',D') is added to the original input state (A, B, C,D) and used as the input state for the next block. The addition of the original state with the modified state is done one register at a time with 32-bit modular addition.

Figure 3.9. Process One Block with MD5


The real work of MD5 is done by the four mixing rounds. The four rounds are similar but not identical. Each mixing round has a nonlinear function at its heart. These four functions, F, G, H, and I, are defined as

F(X, Y, Z) = (X Y) ((¬X) Z)

G(X, Y, Z) = (X Z) (Y Z))

H(X, Y, Z) = X Y Z

I(X, Y, Z) = Y (X Z))

where is the bitwise AND operator, is the bitwise OR operator, and ¬ is the bit complement operation.

F, G, H, and I are used to define the four round operations , , , and :


where <<< s represents a left rotation of s bits.

Let where i is in radians. If we think of a single 512-bit block, B, of the message as comprising 16 32-bit words M0, M1, . . . , M15, round 1 of the nonlinear mixingreading left to right, top to bottomis


The other three rounds, using , , and , are similar:



SHA

The Secure Hash Algorithm (SHA) was designed by the National Security Agency (NSA) and standardized by NIST in 1995 as part of the Digital Signature Standard (DSS). The SHA specification is given in [NIST 2002b].

We are concerned mainly with the SHA-1 algorithm, which produces 160-bit digests. Also specified in [NIST 2002b] are SHA-256, SHA-384, and SHA-512, which produce digests of 256, 384, and 512 bits, respectively. As of this writing, these latter versions are still quite new and have not yet been subject to much review.

Because the SHA algorithms and MD5 are all based on ideas from the MD4 algorithm [Rivest 1992a], they share many of the same features, but the SHA algorithms are more secure, although slower.

SHA-1 uses five registers (A, B, C,D, E) to maintain a state of 20 bytes, which are initialized to (0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476, 0xc3d2e1f0).

After padding the message m exactly as for MD5, each block is expanded from the 16 32-bits words M0, M1, . . . , M15 to 80 32-bit words by


This expansion ensures that every bit of the input block is mixed with the state multiple times.

During processing, the bits from each blockin the form of the Wiare mixed into the state in four 20-step rounds. Each round has a separate round constant, Kt:


and round function, ft(X, Y, Z):


With these definitions, we can describe the 80 steps of the four rounds as a simple loop.

For t = 0 . . . 79

T = (A 5) + ft(B, C,D) + E + Wt + Kt

E = D

D = C

C = B 30

B = A

A = T

After the final block is processed, the values in the five state registers are concatenated to form the digest.

Message Authentication Codes

A message authentication code (MAC) is a value used to verify that a message has not been tampered with since the MAC was computed. The normal protocol is for the recipient to be sent the MAC along with the message it is protecting. The receiver recomputes the MAC and verifies that it is the same value as that included with the message.

An obvious candidate for computing a MAC is one of our cryptographic hash functions. We can't simply use, say, the MD5 digest of the message as the MAC, because an attacker could alter the message and substitute the MD5 digest of the altered message for the original MAC. Nonetheless, we can use MD5 or one of the SHA algorithms as a building block for a MAC. The most straightforward method is to concatenate a secret key to the message and then take a digest of that. That is, if h is a cryptographic hash function, m is a message, and K a secret shared between two communicating peers, we might compute a MAC as MAC(m, K) = h(m||K), where the ||operator denotes concatenation. Notice that attackers can no longer substitute or alter the original message without detection, because they don't know K, and therefore can't compute a new MAC that will verify correctly.

Unfortunately, this MAC has some security problems [Preneel and Oorschot 1995], as does h(K||m) (Exercise 3.7). Even MAC(m, K) = h(K||m||K) is vulnerable [Preneel and Oorschot 1996], so we wouldn't expect to see them used to authenticate messages in any modern system.

Another possible MAC is based on using a block cipher in CBC mode. In this scheme, we first pad the message in a manner similar to that used for MD5 and then encrypt it with the block cipher in CBC mode, using K as the key. After the encryption, we discard all blocks but the last. For example, if m = B1||B2|| . . . ||Bn, where the Bi are the N-bit blocks used by the block cipher, and EK represents encryption with the chosen block cipher using key K, then we compute the MAC as

M0 = IV

Mi = EK(Bi Mi-1)

MAC(m, K) = Mn

Traditionally, IV is set to 0, but any of the normal IV schemes can be used.

HMAC

A simpler scheme that uses one of the cryptographic hash functions as a building block is HMAC. The HMAC specification and a reference implementation using MD5 are given in RFC 2104 [Krawczyk, Bellare, and Canetti 1997]. Let B be the length in bytes of the block size used by the cryptographic hash function, h. For MD5 and SHA-1, B = 64. Then let L be the length in bytes of the output of the hash function. L is 16 for MD5 and 20 for SHA-1. The key, K, should be at least L bytes long and no longer than B bytes. If K is longer than B bytes, it can be reduced to L bytes by hashing it with h. In any event, K is expanded to B bytes by appending 0s. Given a message m, we compute HMAC as

HMAC(m, K) = h((K opad) || h((K ipad) || m))

where ipad is B bytes of 0x36, and opad is B bytes of 0x5c.

HMAC is believed to have good security and is simple to implement, so it is widely used. The authors of RFC 2104 provide an analysis of HMAC and the rationale for its design in [Bellare, Canetti, and Krawczyk 1996]. We will see HMAC used in SSL (Chapter 6) and IPsec (Chapter 9).


Previous Page
Next Page