Transmission Control Protocol. Connection-oriented. Reliable. Unicast only. Defined in RFC 793. Operates end-to-end; routers do not participate. Provides byte-stream abstraction over an unreliable IP network.
Treats data as an ordered byte-stream.
Segment Format
0 | 4 | 8 | 16 | 24 | |||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Source Port | Destination Port | ||||||||||||||||||||||||||||||
Sequence Number | |||||||||||||||||||||||||||||||
Acknowledgment Number | |||||||||||||||||||||||||||||||
Data Offset | Res. | CWR ECE URG ACK PSH RST SYN FIN | Window | ||||||||||||||||||||||||||||
Checksum | Urgent Pointer | ||||||||||||||||||||||||||||||
Options (variable) | |||||||||||||||||||||||||||||||
Data (Payload) | |||||||||||||||||||||||||||||||
Minimum header size: 20 bytes (no options). Data Offset field encodes actual header length in 32-bit words.
Source Port
4 bytes. Port number that the sender is listening on.
Destination Port
4 bytes. Port number that the receiver is listening on.
Sequence Number
4 bytes. Byte offset of first data byte in this segment.
Acknowledgment Number
4 bytes. Next byte expected from the other side. Valid only if ACK flag set.
Window
2 bytes. Receiver’s current buffer capacity. Drives flow control.
Flags
| Flag | Full Name | Set By | Purpose |
|---|---|---|---|
| SYN | Synchronize | Both | Initiates a connection and shares the sender’s ISN. Only set in connection establishment. |
| ACK | Acknowledge | Both | Confirms receipt of data. Once the connection is established, nearly every segment has this set. The acknowledgment number field is only valid when ACK=1. |
| FIN | Finish | Both | Signals the sender has no more data to send. Each side must send its own FIN to fully close the connection (four-way teardown). |
| RST | Reset | Both | Abruptly kills the connection with no teardown. Used when something is wrong — e.g. a segment arrives for a non-existent connection, or an application crashes. |
| PSH | Push | Sender | Tells the receiver to deliver buffered data to the application immediately, rather than waiting to accumulate more. Common in interactive protocols like SSH or Telnet. |
| URG | Urgent | Sender | Marks that some data in the segment is urgent and should be prioritized. The Urgent Pointer field then indicates where the urgent data ends. Rarely used in modern applications. |
Flags can be combined.
Checksum
Mandatory. Computed over pseudo-header + header + data. Same pseudo-header structure as UDP.
Properties
Connection-oriented
Logical circuit established before data transfer. Identified by a 4-tuple: (source IP, source port, destination IP, destination port). State maintained at both endpoints throughout session.
Reliability
Receiver buffers out-of-order segments and delivers in-order to applications. Lost segments are detected via timeouts or duplicate ACKs and are retransmitted. Duplicate segments (based on sequence numbers) are detected and discarded.
Retransmission timeout (RTO) is estimated from measured round trip time (RTT).
Here SRTT stands for smoothed round trip time. Typically and .
Flow Control
Receiver advertises available buffer space in the Window field. Sender transmits at most that many unacknowledged bytes.
Here:
- = receiver window. Used for flow control.
- = congestion window. Used for congestion control.
If Window is set to 0, sender would stop sending data and starts probing periodically with 1-byte segments.
Congestion Control
Congestion is inferred from packet loss in TCP.
4 algorithms operate together:
Slow Start
cwnd begins at 1 MSS, doubles at each acknowledgements. On timeout cwnd is reset to 1 MSS. Inefficient. Low throughput.
Congestion Avoidance
cwnd begins at 1 MSS, doubles at each acknowledgements. Above ssthresh (a custom threshold defined initially), grows linearly. Drops to 1 MSS on timeout. ssthresh redefined to be 1/2 of min(cwnd, rwnd).
Fast Retransmit
3 duplicate ACKs trigger immediate retransmit without waiting for timeout.
Fast Recovery
After fast retransmit, stays near threshold instead of resetting to 1 MSS.
Steps:
- Set .
- Set .
Each of the 3 duplicate ACKs confirms a segment cleared the network. Those still occupy pipe space. - Per additional duplicate ACK, increment by .
Each duplicate ACK confirms another segment left the network. - On first new ACK, set .
Inflation clears. Congestion avoidance resumes.
Duplicate ACKs indicate a single lost segment, not network collapse. Resetting to 1 MSS would be an overreaction.
Problems
Small Packet Problem
Sending many small segments wastes bandwidth. Header-to-data ratio is high when payload is small.
Two causes:
- Sender-side
Application writes data in small chunks. Each write triggers a separate segment. - Receiver-side
Application reads buffer in small increments. Advertised window stays small. Sender fills it with tiny segments. Known as Silly Window Syndrome.
Silly Window Syndrome
A self-reinforcing cycle of small window advertisements and small segment transmissions. Receiver advertises a few bytes of free space. Sender fills it immediately with a tiny segment. Receiver delivers those bytes to the application. A few more bytes free up. Cycle repeats. Each segment carries 20-40 bytes of header for a handful of bytes of payload.
Either side can initiate the cycle:
- Receiver-driven
Application reads buffer in small increments. Window never grows large. - Sender-driven
Application produces data in small writes. Each write is sent immediately.
Both solutions must be deployed together. If only one side is fixed, the other can still sustain the cycle.
Solutions:
- Nagle’s Algorithm
Addresses sender-driven cause. New data is buffered while unacknowledged data exists. Sender flushes when an ACK arrives or the buffer holds one full MSS. - Clark’s Solution
Addresses receiver-driven cause. Receiver suppresses window updates until it can offer space for one MSS or half the total receive buffer.
Connection Establishment
Connection establishment is done as a 3-way handshake.
- Initial SYN request
Host A sends the initial request to host B. Includes a random sequence number (). - SYN-ACK response
Host B sends back its sequence number () and acknowledges A’s sequence number (). - ACK response
Host A acknowledges B’s sequence number ().
If the acknowledgements are not matching, either host will reject the connection.
Connection Termination
After data transfer is complete, connection is released gracefully in both ends independently, in a 4-way handshake.
- Initial FIN request
Host A sends FIN segment to host B. - ACK response
Host B acknowledges A’s FIN. A will not send anymore data (but B might). - FIN from other end
Once host B finish sending the data, it will send FIN to host A. If B did not have any data to send after the last ACK (for FIN), it will merge ACK and FIN into a single request. - ACK from A
Host A acknowledges B’s FIN. And A entersTIME_WAITstate for (which stands for Maximum Segment Lifetime). This is a safety time buffer for old network segments to expire. After that time is passed, the connection is closed.
Limitations
Latency
As TCP is connection-oriented, there is an associated overhead for that. And that contributes to latency. Hence not suitable for VoIP, gaming and live streaming. Those applications use UDP or QUIC instead.
Head-Of-Line Blocking
Aka. HOL Blocking. Occurs because of in-order delivery guarantee. If a segment is lost in transit, TCP requires the receiver to wait for that missing segment to be retransmitted before passing any subsequent data up to the application, even if those later segments have already arrived and are sitting in the buffer. The receiver knows something is missing because sequence numbers have a gap, so it holds everything back until the hole is filled.
This is a issue if multiple data streams are multiplexed over a TCP connection.
Applications
TCP is inappropriate where latency matters more than reliability.
HTTP
Both HTTP/1.1 and HTTP/2 work on top of TCP, because reliable page delivery is required.
SMPT, IMAP
Email protocols are built on top of TCP, for reliability.
FTP
File transfer cannot tolerate data losses.
SSH
Used to create a secure shell to a remote server. Byte-stream with ordering guarantees is required.
Router Queue Management
FIFO
Routers queue packets in FIFO order and drop at the tail when the queue is full. TCP senders detect congestion only after timeout or missing ACKs. Congestion is addressed after damage occurs, not before.
Random Early Detection
RED drops packets before the queue fills, signalling congestion early. Average queue length is compared against two thresholds: and .
Zones:
- Below
Queue length is low. All packets accepted. - Between and
Drop probability increases linearly as queue length rises toward . Drops are random across flows. Flows sending more packets are proportionally more likely to be dropped. - Above
Every arriving packet is dropped.
Average queue length is used instead of instantaneous. Instantaneous values fluctuate due to bursty traffic and would trigger drops during transient bursts that don’t represent sustained congestion.
Explicit Congestion Notification
Both FIFO and RED signal congestion through packet loss. The packet is destroyed to send the signal.
ECN allows a router to set a bit in the IP header when the queue is building, without dropping the packet. The receiver forwards the ECN signal to the sender via an ACK. The sender reduces as it would for a loss.
Marking happens in the forwarding path at the router level. The receiver must be ECN-aware to forward the signal back.
Wireless TCP
TCP’s congestion control assumes packet loss means congestion. On wireless networks, loss occurs due to signal interference and fading, not queue overflow. TCP cannot distinguish the two causes.
On timeout over a wireless link, TCP invokes slow start and reduces . The loss was not caused by congestion. The throughput reduction is unnecessary.
Immediate retransmission does not solve it. On a lossy wireless segment, retransmissions add more traffic. In a heterogeneous network, those retransmissions traverse the wired portion and can cause real congestion there.
Problem in Heterogeneous Networks
When a path spans wired and wireless segments, retransmissions triggered by wireless loss re-enter the wired network. Load is added to wired routers already handling original traffic. Real congestion can develop in the wired portion.
Indirect TCP
The connection splits at the base station into two separate TCP connections: one wired TCP between sender and base station, one wireless-optimised TCP between base station and mobile host.
Drawback: violates end-to-end semantics. An ACK reaching the sender only confirms delivery to the base station, not the destination.
Snooping Agent
A snooping agent at the base station intercepts traffic transparently. The end-to-end TCP connection between sender and mobile host is preserved.
The agent:
- Caches segments destined for the mobile host for local retransmission on wireless loss.
- Retransmits locally when an ACK from the mobile host is missing.
- Removes duplicate ACKs to prevent the sender from misinterpreting wireless loss as congestion.
- Issues selective repeat requests for segments lost on the wireless link.
Transactional TCP
RPC needs a short request-reply exchange. Neither transport fits well:
- UDP
No reliability. Works only if request and reply each fit in one packet and the operation is idempotent. - TCP
3-way handshake and teardown add multiple round trips before any data is exchanged.
Transactional TCP combines connection setup with data transfer. The request is piggybacked onto the SYN. The reply is sent with the SYN-ACK. The connection tears down immediately after. TCP-level reliability without the overhead of a persistent connection.