Python Network Programming Learning Notes Chapter 3: TCP

December 28, 2019 9-minute read

Chapter 3 introduces the other type of modern protocols, TCP(Transmission Control Protocol)

The reason why TCP is commonly used is that TCP guarantees that the data stream will arrive intact without any information lost, duplicated or out of order only if a connection dies or freezes due to a network program. UDP was used a lot when Internet was younger. Compared to UDP, TCP is becoming to more mainstream than UDP due to the ability of TCP handling burst of data. We don’t need to carefully consider size and timing of datagram as we do in UDP.

How TCP works

As we know, in UDP we always encounter the packets drop issue. Sometimes the missing packets will be retransmitted. For UDP, the code has to worry about whether each datagram arrives and make a plan to recover. However, in TCP, all packets themselves are hidden beneath the protocol. The application will stream data to the destination and the lost information will be retransmitted until it finally arrives successfully under TCP. Let’s take a look at what makes TCP provide a reliable connection.

Every TCP packet is given a sequence number so that the system on the receiving end can put them back together in the right order and know which parts of packets are lost for retransmission.
The sequence number is not actually integer(0, 1, 2, 3…) as we assume. TCP actually uses a counter to count the bytes of packets transmitted. For example, a 1024 byte packet with a sequence number of 7200 would be followed by a packet with sequence number 8224. The benefit of this is that a busy network stack don’t need to remember how it broke up a data stream into packets. When a retransmission required, it can break up the stream in any other way, with this kind of sequence number the receiver can still put the packets back together.
The initial sequence number is chosen randomly in good TCP implementations so that attackers cannot assume every connection starts at byte zero. The unpredictable sequence numbers prevent attackers using forged packets to interrupt a conversation.
TCP sends whole bursts of packets at a time before expecting a response. The amount of data that a sender is willing to have on the wire at any given moment is called the size of the TCP window.
The TCP implementation on receiving end can regulate the window size of transmitting end and thus slow or pause the connection. This is called flow control. This allows a receiver to forbid the transmission of additional packets in cases its input buffer is full and discard more data anyway when it were to arrive.
If TCP believes that packets are being dropped, it will assume the network is becoming congested. The congestion may cause packets lost frequently. TCP can ruin connections until a router reboot(think about why we reboot the wifi rooter when there is no response of our internet surfing). By the time of the network comes back up, the two TCP peers will refuse to send each other data at anything other than a internet trickle.

When TCP is inappropriate

TCP has nearly become the universal default when two Internet programs need to communicate. For some of categories of applications, TCP is not optimal.

TCP is unwieldy where clients want to send single, small requests to a server and then they are done and will not talk to it further. It takes three packets for two hosts to set up a TCP connection. Perhaps, another three packets to shut down the connection, which makes a simple request complex. For small requests and short term relationships, UDP really shines here.
The application can do something much smarter than simply retransmit data when a packet has been lost. For example, an audio chat conversation is not appropriate to resend the same second of audio over and over until it arrives. Instead the client will fill the missing second packet with whatever audio it can piece together from the packets that did arrive.

TCP Sockets

TCP uses port numbers to distinguish different applications.

For UDP, it only takes a single socket to speak. A server can open a UDP port and then receive datagram from thousands of different clients. While it is possible to connect() a datagram socket to a particular peer so that the socket will always send() to only that peer and recv() packets sent back from that peer. As I mentioned in last blog, this connect() only write the address into the memory of the system spawning the application. There is no actual information across the network.

But with a stateful stream protocol like TCP, the connect() call becomes the opening step upon which all further network communication. It is the moment when your operating system’s network stack kicks off the handshake protocol, if successful, will make both ends of the TCP stream ready for use. With the understanding of TCP connect(), we know TCP connect() can fail if the remote might not answer or refuse the connection.

The server side is not doing the connect() call but receiving the SYN(Synchronize) packet that connect call initiates, from a passive socket to active socket. The standard POSIX(Portable Operating System Interface) interface to TCP actually involves two completely different kinds of sockets:

The passive or listening sockets: maintains the address and port number. No data can ever be received or sent by this kind of socket. It does not represent any actual network conversation. Instead, it is how the server alerts the operating system to its willingness to receive incoming connections at a given TCP port number in the first place.
The active or connected sockets: is bound to one particular remote conversation partner with a particular IP address and port number. It can only be used for talking back to that one partner and it can be read and written to without worrying about how the resulting data will be split up into packets. The stream is like a pipe or file.

What makes an active socket unique a four-part coordinate, (local_ip, local_port, remote_ip, remote_port).

Differences between UDP and TCP

TCP call is not the local socket whereas UDP uses local sockets. The TCP connect() kicks off the three-way handshake between the client and server machine.
TCP client is in one way much simpler than the UDP client because it does not need to make any provision for dropped packets. Because of the assurance TCP provides, it can send() data without even stopping to check whether the remote end receives it adn run recv() without having to consider the possibility fo retransmitting its request.
TCP considers your outgoing and incoming data to be streams wit hno beginning or end. It feels free to split them up into packets however it wants. For UDP, the datagram is atomic: either arrived or not in transmission. There is no half-sent or half-received UDP datagram. Only full intact datagram are ever delivered to a UDP application. But TCP splits the data stream into packets of several different sizes during transmission and then gradually reassemble them on the receiving end.

Three situations when TCP send call performed

When a TCP is performed, the operating system’s networking stack will face one of three situations.

The data can be immediately accepted by the local system’ networking stack(either because the network card is immediately free to transmit or because the system has room to copy the data to a temporary buffer). In these cases, send() returns immediately with the length of your data string as return value because the whole string is being transmitted.
The network card is busy and the outgoing data buffer for this socket is full and the system cannot allocate any more space. The default behavior of send() is simply to block, pausing the program until the data can be accepted for transmission.
The outgoing buffers are almost full but not quite and so part of the data that you trying to send can be immediately queued, But the rest of the block of data will have to wait. In this case, send() completes immediately and returns the number of byte accepted from the beginning of data string but leaves the rest of data unprocessed.

When the third situation happens, meaning data need to be sent part by part, we can use sendall() in Python standard library. However, for recv(), there is no wrapper called recvall(). Since recv() has to be inside a loop, recv() can handle the parts by parts data in a loop.

Address Already in Use

When we shut down the listening socket in a server but with a connected TCP socket of a client, we will see the server can shut down immediately. But the connected TCP socket may not shut down. This will cause Address already in use issue if we don’t run the server with SO_REUSEADDR option.

Why the connected TCP socket is not shut down immediately? The reason is that after the network stack sends the last packet shutting the socket down, it has no way to be sure that it was received. If the packet happens to be dropped, the remote end might wonder what is taking the last packet so long and retransmit its FIN packet(a packet used to terminate a connection) in the hope of receiving an answer. So to avoid this happen, we would better to specify the socket option SO_REUSEADDR.

Some Code Example

import argparse
import socket


def recvall(sock, length):
    data = b''
    while len(data) < length:
        more = sock.recv(length - len(data))
        if not more:
            raise EOFError(
                'was expecting %d bytes but only received %d bytes before the socket closed' %(length, len(data))
            )
        data += more
    return data


def server(interface, port):
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    sock.bind((interface, port))
    sock.listen(1)
    print('Listening at', sock.getsockname())
    while True:
        sc, sockname = sock.accept()
        print('We have accepted a connection from', sockname)
        print("Socket name:", sc.getsockname())
        print("Socket peer:", sc.getpeername())
        message = recvall(sc, 16)
        print("Incoming sixteen-octet message:", repr(message))
        sc.sendall(b'Farewell, client')
        sc.close()
        print("Reply sent, socket closed")


def client(host, port):
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect((host, port))
    print("Client has been assigned socket name", sock.getsockname())
    sock.sendall(b'Hi there, server')
    reply = recvall(sock, 16)
    print("The server said", repr(reply))
    sock.close()


if __name__ == '__main__':
    choices = {'client': client, 'server': server}
    parser = argparse.ArgumentParser(description='Send and receive over TCP')
    parser.add_argument('role', choices=choices, help='which role to play')
    parser.add_argument('host', help='interface the server listens at host the client sends to')
    parser.add_argument('-p', metavar='PORT', type=int, default=1060, help="TCP port(default 1060)")
    args = parser.parse_args()
    function = choices[args.role]
    function(args.host, args.p)

This is a simple example of TCP implementation. We can run python examples/tcp_example.py server 0.0.0.0 to spawn the server to listening all possible IP address. After we spawn the server, we can run python examples/tcp_example.py client 127.0.0.1 to see the client chat with the server through localhost IP address. The interaction between the server and client shows below

The server:

tcp_server

The client:

udp_client

Summary

The TCP steam socket does whatever is necessary including retransmitting lost packets, reordering the ones that arrive out of sequence and splitting large data streams into optimally sized packets for your network to support the transmission and reception of streams fo data over the network between two sockets.