Python Network Programming Learning Notes Chapter 2: UDP

December 20, 2019 9-minute read

Chapter 2 introduces one of the modern protocols, UDP(User Datagram Protocols).

In the last chapter, we know the transmission unit among services or machines is packets. This chapter interprets how packets are combined to form the conversations that happen between a web browser and server or between between an email client and ISP’s mail server.

Background

As we know, IP protocol is responsible only for attempting to deliver each packet to correct machine. Two additional features of IP protocol is necessary as well to make sure maintaining the conversations with separate applications.

Packets need to be labeled so that web packets can be distinguished from email packets(for example we are in the email conversation scenario)
Damage packets should be retransmitted(consider how a video is loaded in some video website)

UDP solves the first feature(label). It provides port number so that the packets destined for different services on a single machine can be properly demultiplexed. However network programs using UDP will still have some problems when it comes to packet loss, duplication and ordering.

Port Numbers

At the IP network layer, all that is visible are packets winging their way toward a particular host. But the network stacks of t he two communicating machines see the conversation as much more specifically being between an IP address and port number pair on each machine, Source(IP: port number) ==> Destination(IP: port number).

The UDP port is port 53 in a DNS server. For example, there is a DNS server with IP address 192.168.1.9. A client machine with the IP address 192.168.1.30 wants to issue a query to the server. It will craft a request in memory and then ask the operating system to send that block of data as a UDP packet. Since there will need to be some way to identify the client when the packets returns and since the client has not explicitly requested a port number, the operating system(machine) assigns it a random one, say port 44137. Therefore the packet will be sent in the way Source(192.168.1.30:44137) ==> Destination(192.168.1.9:53). Once a response formulated, the DNS server will ask the operating system(DNS server) to send a UDP packet in the way by flipping the addresses to send the response, Source(192.168.1.9:53) ==> Destination(192.168.1.30:44137).

There are three approaches for a client program to learn the port number to which it should connect.

Convention, port 53
Automatic configuration
Manual configuration

There area also three ranges of port number in our daily life.

Well-known ports (0-1023). We can think about it is internal used port for machine. It can be access by the components of the machine itself
Registered ports (1024-49151), such as 5432 for PostgreSql)
The remaining port numbers (49152-65535) are free for any use

Promiscuous Clients and Unwelcome Replies

It is dangerous if we don’t check whether the source address of the datagram is from the server. This will provide one of possible way for attackers to attack you during the request and reply process. We can imagine an attacker can forge a response from the server by jumping in and sending your datagram before the server has a chance to send its own reply.

A well-written encryption should convince the code to talk to the right server. There are tow quick checks.

Design or use protocols that include a unique identifier or request ID and reply with the ID
Check the address of the reply packet or use connect to forbid other addresses from sending you packets

Unreliability, Backoff and Timeouts

It’s normal to have packets dropped during the transmit procedure under UDP. This unreliability means the client has to perform its request inside a loop. It either has to be prepared to wait forever for a reply or else be somewhat arbitrary in deciding when to send another request again. It’s normally hard for the client to distinguish the following events:

The reply is taking a long time to come back, but it will arrive soon
The reply will never arrive because it, or the request, was lost
The service is down, and it is not replying to anyone

We can image the similar scenario in data engineer’s daily work, like scraping data by hitting some data API. When sometimes we fail to request some data, we need some backoff strategy, the most typical one in industry is the exponential backoff strategy. For example, we wait for 1s after we first require some data. If there is no replies after 1s, we will request again and wait for 2s. Under the exponential strategy, we keep the waiting time exponentially increasing under the retries. I recommend a backoff decorator in python while we do some request in our code. After waiting a period of time, we may assume the program stuck in trouble. Then we can terminate the program and return some error message. We can set a threshold called timeout after we applying the backoff strategy. This threshold is the longest length we plan to wait for. The UDP case is the sames as data request case essentially. It also need backoff and timeout to handle its unreliability.

Connecting UDP Sockets

Let’s introduce some calls in UDP.

bind: A server uses it to grab the address that it wants to use and the implicit binding that takes places (when the client first tries to use a socket and is assigned a random ephemeral port number by the operating system)
sendto: Specifies a direct destination, usually combining with recvfrom to receive the replies and carefully check each return address against the list of servers to which you have made requests.
connect: After creating connect, communication will be accomplished by send and recv. The operating system will help filter unwanted packets for you

More details about connect call:

connect does not send any information across teh network. It simply writes the address into memory of operating system for further use when later call send and recv
connect and filtering unwanted packet is not a form of security. An attacker can forge packets with the server’s return address and pass the address filter.

Request ID

We can make the connection more secure by adding request ID.

Two advantages of request ID:

Protects from being confused by duplicated answers to the requests that are repeated several times
Provides a deterrent against spoofing if we use a way making attackers have troubling in guessing the request ID

UDP Fragmentation

Normally a UDP packet can be up to 64kB whereas the Ethernet or wireless card can only handle packets of around 1500 bytes. Therefore, larger UDP datagram has to be split into several small IP packets so that they can traverse the network, which also means larger packets are more likely to be dropped since if any one of the pieces fails to make its way to the destination the whole packet can never be reassembled and delivered to the listening operating system. UDP is not good at retransmitting the packets. Fragmentation brings down the efficiency of UDP.

Socket Options

SO_BROADCAST: This allows broadcast UDP packets to be sent and received
SO_DONTROUTE: Only be willing to send packets that are addressed to hosts on subnets to which this computer is connected directly
SO_TYPE: When passed to getsockop(), this returns to you whether a socket is of type SOCK_DGRAM and can be used for UDP or whether it is of type SOCK_STREAM and instead supports the semantics of TCP

Broadcast

UDP is able to support broadcast, which means instead of sending a datagram to some other specific host, you can address it to an entire subnet to which your machine is attached and have the physical network card broadcast datagram so that all attached hosts see it without its having to be copied separately to each one of them.

However UDP is an old school way to achieve the broadcast. A more sophisticated technique called multicast has been developed that lets modern operating systems take better advantage of the intelligence built into many networks and network interface devices. But if an easy way to keep something like gaming clients or automated scoreboards up-to-date on the local LAN(local area network) required, UDP broadcast is an easy solution.

When to Use UDP

UDP is efficient for sending small message(only one message at a time and then waits for a response). If messages come into a burst, TCP is a better choice.

There are a few good reason to use UDP:

If you are implementing a protocol that already exists and it uses UDP
If you are designing a time-critical media stream whose redundancy allows for occasional packet loss and you never want this second’s data getting hung up waiting for old data from several seconds ago that has not yet been delivered(as happens with TCP)
Unreliable LAN subnet multicast is a great pattern for your application and UDP supports it perfectly

Some Code Example

import argparse, socket
from datetime import datetime

MAX_BYTES = 65535


def server(port):
    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    sock.bind(('127.0.0.1', port))
    print('Listening at {}'.format(sock.getsockname()))
    while True:
        data, address = sock.recvfrom(MAX_BYTES)
        text = data.decode('ascii')
        print('The client at {} says {!r}'.format(address, text))
        text = 'Your data was {} bytes long'.format(len(data))
        data = text.encode('ascii')
        sock.sendto(data, address)


def client(port):
    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    text = 'The time is {}'.format(datetime.now())
    data = text.encode('ascii')
    sock.sendto(data, ('127.0.0.1', port))
    print('The OS assigned me the address {}'.format(sock.getsockname()))
    data, address = sock.recvfrom(MAX_BYTES)
    text = data.decode('ascii')
    print('The server {} replied {!r}'.format(address, text))


if __name__ == '__main__':
    choices = {'client': client, 'server': server}
    parser = argparse.ArgumentParser(description='Send and receive UDP locally')
    parser.add_argument('role', choices=choices, help='which role to play')
    parser.add_argument('-p', metavar='PORT', type=int, default=1060,
                        help='UDP port (default 1060)')
    args = parser.parse_args()
    function = choices[args.role]
    function(args.p)

This is an example of implementing UDP communication locally. sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) demonstrate we are using UDP since the socket is SOCK_DGRAM.

When we run python filename.py server, we spawn a server listening for requests at localhost(127.0.0.1):1060. With the server up, we can run python filename.py client. After running this commend, we passed an date message to server. The server side output will look like the following image.

udp_server

The client side output will look like the following image.

udp_client

We can every time the ports of client are different since we don’t bind any port number in client side. The operating system will assign a port randomly when we initialize a socket.

We can definitely extend the function with some remote IP address to make sure the program works remotely if we have separate machines or some cloud service to test.

Summary

We learn the UDP in this chapter, we know it is efficient in sending small messages but easily get larger datagram dropped while transmission. In the next blog of this series, we will introduce TCP which can resolve the packets dropped issue and good for handling a burst of data.