Chapter 2 introduces one of the modern protocols, UDP(User Datagram Protocols).
In the last chapter, we know the transmission unit among services or machines is packets. This chapter interprets how packets are combined to form the conversations that happen between a web browser and server or between between an email client and ISP’s mail server.
Background
As we know, IP protocol is responsible only for attempting to deliver each packet to correct machine. Two additional features of IP protocol is necessary as well to make sure maintaining the conversations with separate applications.
- Packets need to be labeled so that web packets can be distinguished from email packets(for example we are in the email conversation scenario)
- Damage packets should be retransmitted(consider how a video is loaded in some video website)
UDP solves the first feature(label). It provides port number so that the packets destined for different services on a single machine can be properly demultiplexed. However network programs using UDP will still have some problems when it comes to packet loss, duplication and ordering.
Port Numbers
At the IP network layer, all that is visible are packets winging their way toward a particular host. But the network stacks of t he two communicating machines see the conversation as much more specifically being between an IP address and port number pair on each machine, Source(IP: port number) ==> Destination(IP: port number).
The UDP port is port 53 in a DNS server. For example, there is a DNS server with IP address 192.168.1.9. A client machine with the IP address 192.168.1.30 wants to issue a query to the server. It will craft a request in memory and then ask the operating system to send that block of data as a UDP packet. Since there will need to be some way to identify the client when the packets returns and since the client has not explicitly requested a port number, the operating system(machine) assigns it a random one, say port 44137. Therefore the packet will be sent in the way Source(192.168.1.30:44137) ==> Destination(192.168.1.9:53). Once a response formulated, the DNS server will ask the operating system(DNS server) to send a UDP packet in the way by flipping the addresses to send the response, Source(192.168.1.9:53) ==> Destination(192.168.1.30:44137).
There are three approaches for a client program to learn the port number to which it should connect.
- Convention, port 53
- Automatic configuration
- Manual configuration
There area also three ranges of port number in our daily life.
- Well-known ports (0-1023). We can think about it is internal used port for machine. It can be access by the components of the machine itself
- Registered ports (1024-49151), such as 5432 for PostgreSql)
- The remaining port numbers (49152-65535) are free for any use
Promiscuous Clients and Unwelcome Replies
It is dangerous if we don’t check whether the source address of the datagram is from the server. This will provide one of possible way for attackers to attack you during the request and reply process. We can imagine an attacker can forge a response from the server by jumping in and sending your datagram before the server has a chance to send its own reply.
A well-written encryption should convince the code to talk to the right server. There are tow quick checks.
- Design or use protocols that include a unique identifier or request ID and reply with the ID
- Check the address of the reply packet or use
connect
to forbid other addresses from sending you packets
Unreliability, Backoff and Timeouts
It’s normal to have packets dropped during the transmit procedure under UDP. This unreliability means the client has to perform its request inside a loop. It either has to be prepared to wait forever for a reply or else be somewhat arbitrary in deciding when to send another request again. It’s normally hard for the client to distinguish the following events:
- The reply is taking a long time to come back, but it will arrive soon
- The reply will never arrive because it, or the request, was lost
- The service is down, and it is not replying to anyone
We can image the similar scenario in data engineer’s daily work, like scraping data by hitting some data
API. When sometimes we fail to request some data, we need some backoff
strategy, the most typical one in
industry is the exponential backoff strategy. For example, we wait for 1s after we first require some data. If
there is no replies after 1s, we will request again and wait for 2s. Under the exponential strategy, we keep
the waiting time exponentially increasing under the retries. I recommend a backoff decorator in
python while we do some request in our code. After waiting a period of time, we may assume the program
stuck in trouble. Then we can terminate the program and return some error message. We can set a threshold
called timeout
after we applying the backoff strategy. This threshold is the longest length we plan to wait for.
The UDP case is the sames as data request case essentially. It also need backoff and timeout to handle its
unreliability.
Connecting UDP Sockets
Let’s introduce some calls in UDP.
bind
: A server uses it to grab the address that it wants to use and the implicit binding that takes places (when the client first tries to use a socket and is assigned a random ephemeral port number by the operating system)sendto
: Specifies a direct destination, usually combining withrecvfrom
to receive the replies and carefully check each return address against the list of servers to which you have made requests.connect
: After creating connect, communication will be accomplished bysend
andrecv
. The operating system will help filter unwanted packets for you
More details about connect
call:
connect
does not send any information across teh network. It simply writes the address into memory of operating system for further use when later callsend
andrecv
connect
and filtering unwanted packet is not a form of security. An attacker can forge packets with the server’s return address and pass the address filter.
Request ID
We can make the connection more secure by adding request ID.
Two advantages of request ID:
- Protects from being confused by duplicated answers to the requests that are repeated several times
- Provides a deterrent against spoofing if we use a way making attackers have troubling in guessing the request ID
UDP Fragmentation
Normally a UDP packet can be up to 64kB whereas the Ethernet or wireless card can only handle packets of around 1500 bytes. Therefore, larger UDP datagram has to be split into several small IP packets so that they can traverse the network, which also means larger packets are more likely to be dropped since if any one of the pieces fails to make its way to the destination the whole packet can never be reassembled and delivered to the listening operating system. UDP is not good at retransmitting the packets. Fragmentation brings down the efficiency of UDP.
Socket Options
SO_BROADCAST
: This allows broadcast UDP packets to be sent and receivedSO_DONTROUTE
: Only be willing to send packets that are addressed to hosts on subnets to which this computer is connected directlySO_TYPE
: When passed togetsockop()
, this returns to you whether a socket is of typeSOCK_DGRAM
and can be used for UDP or whether it is of typeSOCK_STREAM
and instead supports the semantics of TCP
Broadcast
UDP is able to support broadcast, which means instead of sending a datagram to some other specific host, you can address it to an entire subnet to which your machine is attached and have the physical network card broadcast datagram so that all attached hosts see it without its having to be copied separately to each one of them.
However UDP is an old school way to achieve the broadcast. A more sophisticated technique called multicast
has been developed that lets modern operating systems take better advantage of the intelligence built into
many networks and network interface devices. But if an easy way to keep something like gaming clients
or automated scoreboards up-to-date on the local LAN(local area network) required, UDP broadcast is an
easy solution.
When to Use UDP
UDP is efficient for sending small message(only one message at a time and then waits for a response). If messages come into a burst, TCP is a better choice.
There are a few good reason to use UDP:
- If you are implementing a protocol that already exists and it uses UDP
- If you are designing a time-critical media stream whose redundancy allows for occasional packet loss and you never want this second’s data getting hung up waiting for old data from several seconds ago that has not yet been delivered(as happens with TCP)
- Unreliable LAN subnet multicast is a great pattern for your application and UDP supports it perfectly
Some Code Example
import argparse, socket
from datetime import datetime
MAX_BYTES = 65535
def server(port):
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(('127.0.0.1', port))
print('Listening at {}'.format(sock.getsockname()))
while True:
data, address = sock.recvfrom(MAX_BYTES)
text = data.decode('ascii')
print('The client at {} says {!r}'.format(address, text))
text = 'Your data was {} bytes long'.format(len(data))
data = text.encode('ascii')
sock.sendto(data, address)
def client(port):
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
text = 'The time is {}'.format(datetime.now())
data = text.encode('ascii')
sock.sendto(data, ('127.0.0.1', port))
print('The OS assigned me the address {}'.format(sock.getsockname()))
data, address = sock.recvfrom(MAX_BYTES)
text = data.decode('ascii')
print('The server {} replied {!r}'.format(address, text))
if __name__ == '__main__':
choices = {'client': client, 'server': server}
parser = argparse.ArgumentParser(description='Send and receive UDP locally')
parser.add_argument('role', choices=choices, help='which role to play')
parser.add_argument('-p', metavar='PORT', type=int, default=1060,
help='UDP port (default 1060)')
args = parser.parse_args()
function = choices[args.role]
function(args.p)
This is an example of implementing UDP communication locally.
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
demonstrate we are using UDP since the socket is
SOCK_DGRAM
.
When we run python filename.py server
, we spawn a server listening for requests at localhost(127.0.0.1):1060.
With the server up, we can run python filename.py client
. After running this commend, we passed an date message
to server. The server side output will look like the following image.
The client side output will look like the following image.
We can every time the ports of client are different since we don’t bind any port number in client side. The operating system will assign a port randomly when we initialize a socket.
We can definitely extend the function with some remote IP address to make sure the program works remotely if we have separate machines or some cloud service to test.
Summary
We learn the UDP in this chapter, we know it is efficient in sending small messages but easily get larger datagram dropped while transmission. In the next blog of this series, we will introduce TCP which can resolve the packets dropped issue and good for handling a burst of data.