TCP

11 min read Last updated Sat Jun 06 2026 07:03:21 GMT+0000 (Coordinated Universal Time)

Transmission Control Protocol. Connection-oriented. Reliable. Unicast only. Defined in RFC 793. Operates end-to-end; routers do not participate. Provides byte-stream abstraction over an unreliable IP network.

Treats data as an ordered byte-stream.

Segment Format

0

4

8

16

24

Source Port

Destination Port

Sequence Number

Acknowledgment Number

Data Offset

Res.

CWR ECE URG ACK


PSH RST SYN FIN

Window

Checksum

Urgent Pointer

Options (variable)

Data (Payload)

Minimum header size: 20 bytes (no options). Data Offset field encodes actual header length in 32-bit words.

Source Port

4 bytes. Port number that the sender is listening on.

Destination Port

4 bytes. Port number that the receiver is listening on.

Sequence Number

4 bytes. Byte offset of first data byte in this segment.

Acknowledgment Number

4 bytes. Next byte expected from the other side. Valid only if ACK flag set.

Window

2 bytes. Receiver’s current buffer capacity. Drives flow control.

Flags

FlagFull NameSet ByPurpose
SYNSynchronizeBothInitiates a connection and shares the sender’s ISN. Only set in connection establishment.
ACKAcknowledgeBothConfirms receipt of data. Once the connection is established, nearly every segment has this set. The acknowledgment number field is only valid when ACK=1.
FINFinishBothSignals the sender has no more data to send. Each side must send its own FIN to fully close the connection (four-way teardown).
RSTResetBothAbruptly kills the connection with no teardown. Used when something is wrong — e.g. a segment arrives for a non-existent connection, or an application crashes.
PSHPushSenderTells the receiver to deliver buffered data to the application immediately, rather than waiting to accumulate more. Common in interactive protocols like SSH or Telnet.
URGUrgentSenderMarks that some data in the segment is urgent and should be prioritized. The Urgent Pointer field then indicates where the urgent data ends. Rarely used in modern applications.

Flags can be combined.

Checksum

Mandatory. Computed over pseudo-header + header + data. Same pseudo-header structure as UDP.

Properties

Connection-oriented

Logical circuit established before data transfer. Identified by a 4-tuple: (source IP, source port, destination IP, destination port). State maintained at both endpoints throughout session.

Reliability

Receiver buffers out-of-order segments and delivers in-order to applications. Lost segments are detected via timeouts or duplicate ACKs and are retransmitted. Duplicate segments (based on sequence numbers) are detected and discarded.

Retransmission timeout (RTO) is estimated from measured round trip time (RTT).

SRTT=(1α)SRTT+αRTTsampleRTO=βSRTT\begin{align*} \text{SRTT} &= (1-\alpha)\cdot\text{SRTT} + \alpha \cdot \text{RTT}_\text{sample} \\ \text{RTO} &= \beta \cdot \text{SRTT} \end{align*}

Here SRTT stands for smoothed round trip time. Typically α=0.125\alpha = 0.125 and β>1\beta > 1.

Flow Control

Receiver advertises available buffer space in the Window field. Sender transmits at most that many unacknowledged bytes.

Sender windowmin(rwnd,cwnd)\text{Sender window} \leq \min(\text{rwnd}, \text{cwnd})

Here:

  • rwnd\text{rwnd} = receiver window. Used for flow control.
  • cwnd\text{cwnd} = congestion window. Used for congestion control.

If Window is set to 0, sender would stop sending data and starts probing periodically with 1-byte segments.

Congestion Control

Congestion is inferred from packet loss in TCP.

4 algorithms operate together:

Slow Start

cwnd begins at 1 MSS, doubles at each acknowledgements. On timeout cwnd is reset to 1 MSS. Inefficient. Low throughput.

Congestion Avoidance

cwnd begins at 1 MSS, doubles at each acknowledgements. Above ssthresh (a custom threshold defined initially), grows linearly. Drops to 1 MSS on timeout. ssthresh redefined to be 1/2 of min(cwnd, rwnd).

Fast Retransmit

3 duplicate ACKs trigger immediate retransmit without waiting for timeout.

Fast Recovery

After fast retransmit, cwnd\text{cwnd} stays near threshold instead of resetting to 1 MSS.

Steps:

  1. Set ssthresh=12×min(cwnd,rwnd)\text{ssthresh} = \frac{1}{2} \times \min(\text{cwnd}, \text{rwnd}).
  2. Set cwnd=ssthresh+3×MSS\text{cwnd} = \text{ssthresh} + 3 \times \text{MSS}.
    Each of the 3 duplicate ACKs confirms a segment cleared the network. Those still occupy pipe space.
  3. Per additional duplicate ACK, increment cwnd\text{cwnd} by MSS\text{MSS}.
    Each duplicate ACK confirms another segment left the network.
  4. On first new ACK, set cwnd=ssthresh\text{cwnd} = \text{ssthresh}.
    Inflation clears. Congestion avoidance resumes.

Duplicate ACKs indicate a single lost segment, not network collapse. Resetting cwnd\text{cwnd} to 1 MSS would be an overreaction.

Problems

Small Packet Problem

Sending many small segments wastes bandwidth. Header-to-data ratio is high when payload is small.

Two causes:

  • Sender-side
    Application writes data in small chunks. Each write triggers a separate segment.
  • Receiver-side
    Application reads buffer in small increments. Advertised window stays small. Sender fills it with tiny segments. Known as Silly Window Syndrome.

Silly Window Syndrome

A self-reinforcing cycle of small window advertisements and small segment transmissions. Receiver advertises a few bytes of free space. Sender fills it immediately with a tiny segment. Receiver delivers those bytes to the application. A few more bytes free up. Cycle repeats. Each segment carries 20-40 bytes of header for a handful of bytes of payload.

Either side can initiate the cycle:

  • Receiver-driven
    Application reads buffer in small increments. Window never grows large.
  • Sender-driven
    Application produces data in small writes. Each write is sent immediately.

Both solutions must be deployed together. If only one side is fixed, the other can still sustain the cycle.

Solutions:

  • Nagle’s Algorithm
    Addresses sender-driven cause. New data is buffered while unacknowledged data exists. Sender flushes when an ACK arrives or the buffer holds one full MSS.
  • Clark’s Solution
    Addresses receiver-driven cause. Receiver suppresses window updates until it can offer space for one MSS or half the total receive buffer.

Connection Establishment

Connection establishment is done as a 3-way handshake.

  • Initial SYN request
    Host A sends the initial request to host B. Includes a random sequence number (seq=x\text{seq=}x).
  • SYN-ACK response
    Host B sends back its sequence number (seq=y\text{seq}=y) and acknowledges A’s sequence number (ack=x+1\text{ack=}x+1).
  • ACK response
    Host A acknowledges B’s sequence number (ack=y+1\text{ack}=y+1).

If the acknowledgements are not matching, either host will reject the connection.

Connection Termination

After data transfer is complete, connection is released gracefully in both ends independently, in a 4-way handshake.

  • Initial FIN request
    Host A sends FIN segment to host B.
  • ACK response
    Host B acknowledges A’s FIN. A will not send anymore data (but B might).
  • FIN from other end
    Once host B finish sending the data, it will send FIN to host A. If B did not have any data to send after the last ACK (for FIN), it will merge ACK and FIN into a single request.
  • ACK from A
    Host A acknowledges B’s FIN. And A enters TIME_WAIT state for 2×MSL2 \times \text{MSL} (which stands for Maximum Segment Lifetime). This is a safety time buffer for old network segments to expire. After that time is passed, the connection is closed.

Limitations

Latency

As TCP is connection-oriented, there is an associated overhead for that. And that contributes to latency. Hence not suitable for VoIP, gaming and live streaming. Those applications use UDP or QUIC instead.

Head-Of-Line Blocking

Aka. HOL Blocking. Occurs because of in-order delivery guarantee. If a segment is lost in transit, TCP requires the receiver to wait for that missing segment to be retransmitted before passing any subsequent data up to the application, even if those later segments have already arrived and are sitting in the buffer. The receiver knows something is missing because sequence numbers have a gap, so it holds everything back until the hole is filled.

This is a issue if multiple data streams are multiplexed over a TCP connection.

Applications

TCP is inappropriate where latency matters more than reliability.

HTTP

Both HTTP/1.1 and HTTP/2 work on top of TCP, because reliable page delivery is required.

SMPT, IMAP

Email protocols are built on top of TCP, for reliability.

FTP

File transfer cannot tolerate data losses.

SSH

Used to create a secure shell to a remote server. Byte-stream with ordering guarantees is required.

Router Queue Management

FIFO

Routers queue packets in FIFO order and drop at the tail when the queue is full. TCP senders detect congestion only after timeout or missing ACKs. Congestion is addressed after damage occurs, not before.

Random Early Detection

RED drops packets before the queue fills, signalling congestion early. Average queue length is compared against two thresholds: MIN\text{MIN} and MAX\text{MAX}.

Zones:

  • Below MIN\text{MIN}
    Queue length is low. All packets accepted.
  • Between MIN\text{MIN} and MAX\text{MAX}
    Drop probability increases linearly as queue length rises toward MAX\text{MAX}. Drops are random across flows. Flows sending more packets are proportionally more likely to be dropped.
  • Above MAX\text{MAX}
    Every arriving packet is dropped.

Average queue length is used instead of instantaneous. Instantaneous values fluctuate due to bursty traffic and would trigger drops during transient bursts that don’t represent sustained congestion.

Explicit Congestion Notification

Both FIFO and RED signal congestion through packet loss. The packet is destroyed to send the signal.

ECN allows a router to set a bit in the IP header when the queue is building, without dropping the packet. The receiver forwards the ECN signal to the sender via an ACK. The sender reduces cwnd\text{cwnd} as it would for a loss.

Marking happens in the forwarding path at the router level. The receiver must be ECN-aware to forward the signal back.

Wireless TCP

TCP’s congestion control assumes packet loss means congestion. On wireless networks, loss occurs due to signal interference and fading, not queue overflow. TCP cannot distinguish the two causes.

On timeout over a wireless link, TCP invokes slow start and reduces cwnd\text{cwnd}. The loss was not caused by congestion. The throughput reduction is unnecessary.

Immediate retransmission does not solve it. On a lossy wireless segment, retransmissions add more traffic. In a heterogeneous network, those retransmissions traverse the wired portion and can cause real congestion there.

Problem in Heterogeneous Networks

When a path spans wired and wireless segments, retransmissions triggered by wireless loss re-enter the wired network. Load is added to wired routers already handling original traffic. Real congestion can develop in the wired portion.

Indirect TCP

The connection splits at the base station into two separate TCP connections: one wired TCP between sender and base station, one wireless-optimised TCP between base station and mobile host.

Drawback: violates end-to-end semantics. An ACK reaching the sender only confirms delivery to the base station, not the destination.

Snooping Agent

A snooping agent at the base station intercepts traffic transparently. The end-to-end TCP connection between sender and mobile host is preserved.

The agent:

  • Caches segments destined for the mobile host for local retransmission on wireless loss.
  • Retransmits locally when an ACK from the mobile host is missing.
  • Removes duplicate ACKs to prevent the sender from misinterpreting wireless loss as congestion.
  • Issues selective repeat requests for segments lost on the wireless link.

Transactional TCP

RPC needs a short request-reply exchange. Neither transport fits well:

  • UDP
    No reliability. Works only if request and reply each fit in one packet and the operation is idempotent.
  • TCP
    3-way handshake and teardown add multiple round trips before any data is exchanged.

Transactional TCP combines connection setup with data transfer. The request is piggybacked onto the SYN. The reply is sent with the SYN-ACK. The connection tears down immediately after. TCP-level reliability without the overhead of a persistent connection.

Was this helpful?