Once upon a time I was given a task to improve TPS (transaction per second) of a UDP Server. UDP Server was a very simple program written in Java using datagram socket server. Program was receiving udp message and doing some replacement in the message. Each message was of a small size(around 1024 bytes).
But huge packet loss was being reported on client side.
Analysis for finding the root cause was as follows :
- Being UDP server, minor percentage of packet loss is acceptable and if network is heavily loaded, then its unavoidable. But in this case around 80%-90% packets were being dropped. Sysadmin also checked network congestion and it was withing normal limits. So something else was definitely going wrong
- Server machine configuration was also quite good. (64 GB RAM, 4 core cpu with 2.x GHz clock speed ), network card attached was of 10Gbps) so there does not seemed to be an issue with the machine hardware configuration.
- Other area of suspicion was multi-threading. Program was single threaded. It was reading message and processing. So first thing which comes to every programmer is that let’s try multi-threading. I also went ahead with that though process and started looking at the code to make it multi-threaded.
- But when I looked at code, I realized that it was just doing simple find replace after reading the message, which should not take more than few instructions on processor. 80% around package loss due to processing time was very much unlikely.
- After looking at the code, it seemed that if code was made multi-threaded, it might increase the overall time, instead of decreasing it, due to the synchronization, thread switch and other overhead introduced by multi-threading
- At this point, I was sure that packet loss is not happening due to network congestion, hardware configuration, long message processing time. So what is going wrong than?? why packet loss ??
- So I started looking at all the steps involved in UDP packet processing. It has got mainly 2 steps
- Packets are first received by OS and buffered
- Program reads packets from this queue, in this case java program’s datagram socket reads from the OS packet buffer
- Since the program was very simpler one, I started exploring about first point and try to find some issue over there
- I found that in linux systems, UDP and TCP packets which are received are queued.
- There is separate buffer for reading packets and writing packets
- Each buffer has some default memory allocated, but we can configure that memory as per our requirements.
- Now this could be the problem, if there is not enough memory allocated to the buffer then packet loss can occur, consider following scenarios
- Packets are coming at higher speed than the processing capability of the program. If that is the case then packets will be buffered for some time and when the buffer becomes full, packet starts to drop
- Packet are coming at much higher rate, packet rate is so higher that even if processing is able to cope up with the packet rate, but there is not enough memory to store it. I will elaborate this point with an example
- Suppose packets are coming at rate of 100 packets/second
- Program is able to process 100 packets/second
- But OS has allocated memory to read buffer such that only 10 packets can be stored
- Now say on the 1st second 10 packets will come, program will process that packets and everything works fine
- 2nd second 100 packets will come simultaneously, but os buffer can store only 10 packets at a time, so it will store 10 packets and drop rest of the 90 packets. Program will process that 10 packets
- Here though the program is capable of processing 100 packets, packet loss occurs due to lower memory allocated to OS read buffer. This situation is out of scope for any program and needs to be handled at OS level
- Linux has configuration parameter for fine tuning the UDP and TCP read/writer buffer. Following parameters were changed :
- net.core.rmem_default : The default setting of the socket receive buffer in bytes
- net.core.wmem_default : The default setting(in bytes) of the socet send buffer
- net.core.rmem_max : The maximum receive socket bufer size in bytes
- net.core.wmem_max : The maximum send socket buffer in bytes.
- net.core.netdev_max_backlog : Maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them.
Following are the commands to change parameters on RHEL:
- sysctl -w net.core.rmem_default=73400320
- sysctl -w net.core.wmem_default=73400320
- sysctl -w net.core.rmem_max=73400320
- sysctl -w net.core.wmem_max=73400320
- sysctl -w net.core.netdev_max_backlog=3000
Above values should also be specified in /etc/sysctl.conf file so that values will be persisted during machine restart.
Follow below steps to add these values in /etc/sysctl.conf file
- Open the /etc/sysctl.conf file
- vi /etc/sysctl.conf
- Now add the following properties in the file if not already present, if any property is already present in the file then please change the value for that property in the file
- core.rmem_default = 73400320
- core.wmem_default = 73400320
- core.rmem_max = 73400320
- core.wmem_max = 73400320
- core.netdev_max_backlog = 3000
- Save the /etc/sysctl.conf file
Here in my case I had configured 70MB of buffer and max_backlog of 3000, which turned out to be sufficient. Depending on the traffic requirement this might need to be adjusted.
Before these changes UDP server’s TPS was around 1000, after these changes TPS went up to 9000 and still there was a potential if traffic increases.