Previous Page
Next Page

2.7. TCP

The Transmission Control Protocol (TCP) provides a reliable, connection-based byte stream delivery service. The last sentence is ripe with meaning, so let's spend a little time exploring what it means.

We have seen that IP and UDP are best-effort protocols. They make no guarantees that they will deliver a datagram or if they do, that they will deliver it in order. Because IP does not checksum its payload, and because the UDP checksum is optional, these protocols don't even guarantee that any data they do deliver will arrive uncorrupted.

TCP, on the other hand, is prepared to make some guarantees. It guarantees that any data that arrives at the destination will be in order and uncorrupted.

We should be more precise here. When we say that TCP guarantees that the data will be uncorrupted, we mean that it guarantees it to the extent that the 16-bit Internet checksum is able to detect corruption.

Reliability also means that TCP agrees to try really hard to deliver all the data that the sender commits to it for transmission to the destination. It does this by demanding acknowledgments from its peer TCP that data has arrived and by retransmitting the data after a suitable time if it does not receive an acknowledgment. Reliability most emphatically does not mean that TCP unconditionally promises to deliver any data that the sender writes. Even a moment's thought will convince us that TCP couldn't possibly keep that promise under all circumstances. For more on what reliability means and doesn't mean, see Tip 9 of ETCP.

In order to implement its retransmission strategy, TCP must maintain state between the blocks of data, called segments, that it sends to its peer. It does this by establishing a logical connection with its peer. The usual analogy is that TCP is like a phone call: A connection is established, words are delivered in the order that they are spoken, and it is not necessary to address each sentence or word by, say, continually redialing the peer's number. When the parties finish their conversation, they say goodbye and hang up, and the connection is torn down. As we shall see, TCP goes through similar stages: A connection is established, the peers exchange data without needing to specify their peers' addresses with each write, and when they are finished, the connection is torn down.

The telephone analogy is a useful one for understanding the difference between connectionless and connection-based protocols, but it can be misleading. With a phone call, a physical connection is established, but the TCP connection is entirely notional, consisting only of shared state maintained by the two peers. The data itself, as shown in Figure 2.15, is carried in an IP datagram, just as it is for IP, ICMP, and UDP.

Figure 2.15. TCP Encapsulation


One of the most common misunderstandings about TCP involves its data delivery model. TCP delivers a byte stream to the receiving application. This means that TCP has no notion of records or packets that are visible at the user level. Suppose that an application writes a series of 500-byte messages. On any given read, the receiver may read part of one of those messages, all of one of the messages, or more than one of the messages. As Varghese [Varghese 2005] puts it, TCP simulates a shared data queue into which the sender puts bytes and the receiver removes bytes. There is no way for the receiving TCP to tell whether or not 2 bytes were put into the queue at the same time.

This is not to say that the application cannot impose its own record structure on the byte stream, only that TCP doesn't do so. Figure 2.5, for example, shows one way for an application to do this. Explicit record markers, such as newlines in textual data, are another. Later in the text, we will see several examples of applications or other protocols running over TCP imposing their own record structure on the byte stream.

From the preceding discussion, the meaning of our description of TCP as a reliable, connection-based, byte stream delivery service should now be clear. Notice how this service is the antithesis of UDP, which is an unreliable, connectionless, datagram delivery service. The TCP specification is given in RFC 793 [Postel 1981b].

The TCP Header

As we mentioned above, TCP sends its data in blocks called segments. The format of these segments is shown in Figure 2.16. The source port and destination port serve to identify the sending and receiving applications, just as they do in UDP datagrams.

Figure 2.16. The TCP Header


In TCP, every byte has a sequence number. We do not, of course, attach a sequence number to every byte. Instead, the sequence number of the first byte in a segment is placed in the sequence number field. The sequence numbers of the remaining bytes in the segment are then known by implication.

As we shall see when we discuss the TCP flag bits, the SYN and FIN bits also take up a sequence number.

When TCP receives a segment from its peer, it returns the sequence number of the next byte it is expectingthat is, the sequence number after that of the largest-numbered byte it has receivedin the acknowledgment number field. This serves as an acknowledgment to its peer that TCP has received all bytes up to but not including the byte numbered in the acknowledgment number field. If the receiving peer has data of its own to send, it piggybacks the acknowledgment number in the data segment. Otherwise it sends the acknowledgment in a segment without any data. Every segment after the first (SYN) segment must specify the next byte it is expecting from its peer in the acknowledgment field.

TCP uses sequence numbers to ensure that applications receive the data in order. Most TCPs will queue any out-of-order data they receive until the missing bytes arrive. It is legal, however, for TCP to merely drop the data and send an acknowledgment indicating the data it is expecting.

The TCP header is nominally 20 bytes but commonly contains optional data, such as timestamps or announcements concerning the maximum segment size the peer is willing to accept. TCP puts the size of the header, including options, in 32-bit words in the data offset field.

There are six flags that TCP can set in the header:

SYN

This flag is set in the initial segment when a new connection is being set up. When the SYN flag is set, it has the sequence number given in the sequence number field, and the first byte of data has the next sequence number.

FIN

This flag serves as an EOF indicating that the TCP that set it is through sending data, although it may be willing to accept more data. When this flag is set, it has the sequence number after that of the last byte of data in the segment.

ACK

This flag is set when the value in the acknowledgment number field is valid. The ACK flag must be set in every segment except possibly the first when the SYN segment is set.

RST

This flag is used to reset the connection. Its most common use is during connection setup to indicate that no application is listening for a connection on the destination port. The flag is also used to immediately abort a connection under certain error conditions.

PSH

This flag is intended to indicate that the receiver should deliver any data in its receive buffer to the application but is virtually always ignored by the receiver. Senders often set it when the current segment empties the send buffer.

URG

This flag indicates that urgent data is available in the byte stream at the offsetfrom this segmentshown in the urgent pointer field. The meaning and use of this field are widely misunderstood. See [Stevens 1998] for some guidance on its use.


TCP provides flow control by telling its peer how much data it is currently willing to accept. It does this by advertising a receive window in the window size field. The receive window is the number of bytes, starting with the one numbered in the acknowledgment field, that TCP has space to buffer. When TCP has exhausted the buffer space for a connection, it will advertise a zero-sized window, and its peer will not send it any more data until it advertises a positive window size again.

TCP has a mandatory checksum, which is placed in the checksum field. As with IP and UDP, the checksum is the standard Internet checksum. Also as with UDP, the checksum is calculated over the pseudoheader, the TCP header and options, and the TCP data.

Looking at lines 1.21.4 from Figure 2.4, we see the TCP header of the segment encapsulated in the IP datagram:

1.2   ac1e 0001 0403 1388 445d 9b9b 53ff 98e7   ........D]..S...
1.3   8018 e240 7b3a 0000 0101 080a 0000 c860   ...@{:.........'
1.4   3503 72f1 0000 0011 4441 5441 4441 5441   5.r.....DATADATA

The first 12 bytes are the source and destination ports, the sequence number of the first byte of data, and the acknowledgment number. The destination port number, for example, is 5000 (0x1388).

The first 2 bytes in line 1.3 tell us that the TCP header is 32 bytes (8 words) long and that the PSH and ACK flags (0x18) are set. A good exercise is to verify the rest of the fields with line 1 of Figure 2.4. We had tcpdump print that line in a way that makes it easy to see the exact values.

Normally, tcpdump prints sequence and acknowledgment numbers relative to the sequence number of the SYN bytes. This makes the numbers smaller and easier to follow. For Figure 2.4, we inhibited that behavior. We also asked tcpdump to print numeric rather than symbolic addresses.

Connection Setup: The Three-Way Handshake

The normal connection sequence is shown in Figure 2.17.

Figure 2.17. TCP Three-Way Handshake


One of the peers, usually called the client, initiates the connection by sending its peer, usually called the server, a SYN segment that has an initial sequence number and perhaps some other connection parameters. The server responds by acknowledging the client's SYNwe say that the server ACKs the SYNand sending its own SYN segment with an initial sequence number and optional connection parameters. The client ACKs the server's SYN, and the connection is established.

Sometimes, we say that the connection is synchronized, meaning that the two peers have synchronized their connection state. The term SYN is often used as shorthand for synchronization segmenta segment with the SYN flag set.

We can see the handshake in action by using tcpdump to capture a TCP session. On laptop, we use netcat (nc) to connect to the echo server on solaris:

laptop:~
$ nc solaris echo
hello
hello
^C punt!

We run tcpdump on laptop to capture the connection setup. In this case, we didn't ask tcpdump to print a hex dump of the datagrams:

1   05:57:05.603882 laptop.1033 > solaris.echo:
    S 544197796:544197796(0) win 57344
    <mss 1460,nop,wscale 0,nop,nop,timestamp 75371 0> (DF)
2   05:57:05.631720 solaris.echo > laptop.1033:
    S 3156319241:3156319241(0) ack 544197797 win 24616
    <nop,nop,timestamp 1559953062 75371,nop,
    wscale 0,mss 1460> (DF)
3   05:57:05.631927 laptop.1033 > solaris.echo: . ack 1 win 57920
    <nop,nop,timestamp 75374 1559953062> (DF)

In line 1, laptop sends a SYN segment to solaris asking to establish a connection with the application listening on port 7 (the echo port). The segment establishes an initial sequence number of 544197796 and announces two connection parameters. First, it tells its peer that its MSS (maximum segment size) is 1,460 bytes. Second, it turns off window scaling (wscale 0). The timestamp option contains some timing information that TCP uses to calculate the round-trip time (RTT) of segments. TCP uses this information in its retransmission strategy. See [Stevens 1994] for the details of the MSS and window scale parameters and the timestamp option. The (DF) at the end of the line indicates that the DF flag in the IP header is set. This is the path MTU discovery mechanism that we discussed earlier.

Next, solaris ACKs the SYN and sends its own SYN and connection parameters. Notice that the acknowledgment number is 544197797, reflecting the fact that solaris is expecting byte 544197797 next. Finally, in line 3, laptop ACKs the SYN from solaris, and the connection is established.

It is also possible, but rare, for both peers to initiate a connection. This happens when they both send a SYN at roughly the same time. The SYNs cross in the network, as illustrated in Figure 2.18, and both peers respond with an ACK. This four-way handshake results in a single connection rather than the two that we might expect.

Figure 2.18. The Four-Way Handshake


If a host sends a SYN to a host for a port at which no application is listening, the receiving host will respond with an RST (a segment with the RST bit set). This tells the sending host that the connection cannot be established and that it should abandon the attempt.

That is, the RST indicates a hard error. If, for example, the sending host does not receive a SYN-ACK in response to its SYN after a given time, it will continue the attempt to connect by resending the SYN.

To illustrate this, we attempt to connect to a port where no application is listening:

laptop:~
$ nc -v linux 8000
linux [172.30.0.4] 8000 (?) : Connection refused

Then tcpdump shows linux responding to the SYN with an RST:

1   11:21:19.974218 laptop.1070 > linux.8000:
    S 1025154961:1025154961(0) win 57344
    <mss 1460,nop,wscale 0,nop,nop,timestamp 305005 0> (DF)
2   11:21:19.980602 linux.8000 > laptop.1070: R 0:0(0)
    ack 1025154962 win 0 (DF)

As we see in line 2, the response from linux has the RST bit set.

Connection Shutdown

After the two TCP peers have finished exchanging data, they enter the final phase of the session: connection teardown. When one side is finished transmitting data, it sends a segment with the FIN bit set. This acts as an EOF, telling the other side that TCP will send no more data. As illustrated in Figure 2.19, the other side will normally also send a FIN, completing the shutdown.

Figure 2.19. Connection Shutdown


Here is the end of the connection that we initiated to the echo server on solaris:

1  05:57:26.297609 laptop.1033 > solaris.echo: F 7:7(0) ack 7
   win 57920 <nop,nop,timestamp 77440 1559954580> (DF)
2  05:57:26.300517 solaris.echo > laptop.1033: . ack 8
   win 24616 <nop,nop,timestamp 1559955131 77440> (DF)
3  05:57:26.322748 solaris.echo > laptop.1033: F 7:7(0) ack 8
   win 24616 <nop,nop,timestamp 1559955131 77440> (DF)
4  05:57:26.322930 laptop.1033 > solaris.echo: . ack 8
   win 57920 <nop,nop,timestamp 77443 1559955131> (DF)

The FIN segments are in lines 1 and 3; the ACKs for them, in lines 2 and 4.

It's also possible for one side to close and the other to continue to send data. For example, the client could make a request of the server and then close its half of the connection to indicate that it's through making requests. The server would not close its side of the connection until it had finished responding to the client. This is illustrated in Figure 2.20. See Tip 16 of ETCP for more information on the halfclose operation and how it can be used in the so-called orderly release operation.

Figure 2.20. A Halfclose



Previous Page
Next Page