12.3. ESP Processing
Before looking at the details of the ESP transport and tunnel modes, we should understand how the TCP/IP stack processes ESP packets. The rules are slightly different for input and output, so we treat them separately.
ESP Output Processing
When it is ready to be placed on the output queue, an IP datagram is checked for possible IPsec processing. If ESP encapsulation is required, its exact form depends on whether the SA mandates transport or tunnel mode. We examine this in detail in the next two sections. Output processing involves the following steps.
1. | The SPD is searched for an SA that matches the appropriate selectorssource address, destination address, ports, protocol, etc.in the packet. If an SA doesn't already exist, a pair of SAs is negotiated (see Chapter 13). | 2. | The sequence number from the SA is incremented and placed in the ESP header. If the peer has not disabled the antireplay function, the sequence number is checked to make sure that it hasn't wrapped to 0.
| 3. | Padding is added, if necessary, and the pad length and next header fields are filled in. If the encryption algorithm requires it, an IV is added to the payload data. The IV and data payload and the ESP trailer fields are encrypted, using the algorithm and key specified in the SA.
| 4. | The ICV is calculated over the ESP header, the IV and data payload, and the ESP trailer fields and placed in the authentication data field. The ICV is calculated, using the algorithm and key specified in the SA.
| 5. | If the resulting packet requires fragmentation, it is performed at this point. In transport mode, ESP is applied only to entire IP datagrams. In tunnel mode, ESP may be applied to an IP datagram fragment. For example, a VPN gateway may apply ESP to an IP datagram that was fragmented by the sending host.
|
The order in which the encryption and authentication functions are performed is important. Because authentication is performed last, the ICV is computed over the encrypted data. This means that the receiver can perform the relatively speedy authentication verification before performing the slower decryption process. This prevents an attacker from overloading the receiver by sending a flood of randomly encrypted packets.
See [Ferguson and Schneier 1999] for a contrarian view. The authors argue that "the meaning" and not "what was said" should be authenticated, and thus ESP should first authenticate and then encrypt. They point out that if concerns about DOS attacks require the current order, the encryption key should, at the very least, be part of the data authenticated. The principle here, which they call out as Lesson 3, is that not just the message but everything used to determine the meaning of the message should be authenticated.
It turns out that there is a "right" answer to this question. Krawczyk [Krawczyk 2001] shows that under fairly general assumptions about the encryption and authentication algorithms, encrypting and then authenticating is secure, but authenticating and then encrypting is not. In the context of IPsec, these results are less dispositive than we might hope, because he also shows that the order of encryption/authentication does not effect security when using a block cipher in CBC mode or a stream cipher.
ESP Input Processing
Because an IP datagram carrying an ESP packet may have been fragmented by intervening routers, the stack reassembles the IP datagram before performing the ESP processing. After any reassembly, the stack performs the following steps.
1. | The SA is retrieved by matching the destination address, protocol (ESP), and SPI of the packet. If no SA exists for the packet, it is dropped.
| 2. | If the antireplay service is enabled, the sequence number of the packet is checked to make sure that it is new and falls within the antireplay window.
| 3. | The packet is authenticated by computing the ICV over the ESP header, payload, and ESP trailer fields, using the algorithm and key specified in the SA. If the authentication fails, the packet is dropped; otherwise, the antireplay window is updated.
| 4. | The payload and ESP trailer fields are decrypted, using the algorithm and key in the SA. If padding was added, it should be checked to make sure it has the values appropriate for the decryption algorithm. The original IP datagram is reconstructed from the ESP packet. The details of this reconstruction depend on whether the SA specifies transport or tunnel mode.
|
|