Is TCP really that slow?

I’m still working on Vrui’s second-generation collaboration / tele-presence infrastructure (which is coming along nicely, thankyouverymuch), and I also recently started working with another group of researchers who are trying to achieve similar goals, but have some issues with their own home-grown network system, which is based on Open Sound Control (OSC). I did some background research on OSC this morning, and ran into several instances of an old pet peeve of mine: the relative performance of UDP vs TCP. Actually, I was trying to find out whether OSC communicates over UDP or TCP, and whether there is a way to choose between those at run-time, but most sources that turned up were about performance (it turns out OSC simply doesn’t do TCP).

Here are some quotes from one article I found: “I was initially hoping to use UDP because latency is important…” “I haven’t been able to fully test using TCP yet, but I’m hopeful that the trade-off in latency won’t be too bad.”

Here are quotes from another article: “UDP has it’s [sic] uses. It’s relatively fast (compared with TCP/IP).” “TCP/IP would be a poor substitute [for UDP], with it’s [sic] latency and error-checking and resend-on-fail…” “[UDP] can be broadcast across an entire network easily.” “Repeat that for multiple players sharing a game, and you’ve got a pretty slow, unresponsive game. Compared to TCP/IP then UDP is fast.” “For UDP’s strengths as a high-volume, high-speed transport layer…” “Sending data via TCP/IP has an ‘overhead’ but at least you know your data has reached its destination.” “… if the response time [over TCP] was as much as a few hundred milliseconds, the end result would be no different!”

First thing first: Yes, UDP can send broadcast or multicast IP packets. But that’s not relevant for 99.9% of applications: IP broadcast only works on a single local network segment, and IP multicast does not work on the public Internet — there is currently no mechanism to assign multicast addresses dynamically, and therefore multicast packets that do not use well-known reserved addresses are ignored by Internet routers. So no points there.

In summary, according to these articles (which reflect common wisdom; I do not intend to pick on these specific authors or articles), TCP is slow. Specifically — allegedly — it has high latency (a few hundred milliseconds over UDP, according to the second article), and low bandwidth compared to UDP.

Now let’s put that common wisdom to the test. Fortunately, my collaboration framework has some functionality built in that allows a direct comparison. For example, the base protocol can send echo requests (akin to ICMP ping) at regular intervals, to keep a running estimate of transmission delay between the server and all clients, and to synchronize the server’s and client’s real-time clocks. These ping packets are typically sent over UDP, but since not all clients can always use UDP, the protocol can fall back to using TCP. The echo protocol is simple: the client sends an echo request over TCP or UDP, the server receives the request, and immediately sends an echo reply to the client, over the same channel on which it received the request. This allows us to compare the latency of sending data over TCP vs. UDP.

I ran the first experiment between my home PC and my server at UC Davis. Here are the results from 200 echo packet round-trips: (I also list timings using ICMP, i.e., the “real” ping protocol, as a baseline):

Ping methodmean round-trip time [ms]Std. deviation [ms]
TCP48.0434.171
UDP48.3524.280
ICMP47.2734.721

Oookay, that’s not exactly what common wisdom would predict. TCP and UDP have the same latency (the minor numerical difference is safely within the margin of error), and are less than 2% slower than bare-metal ICMP. Let’s try that again, but between a client and server running on the same computer:

Ping methodmean round-trip time [ms]Std. deviation [ms]
TCP0.31000.0524
UDP0.28960.0280
ICMP0.07100.0120

In this test, UDP is indeed faster than TCP, by a whopping 0.02 ms (again, within the margin of error). Notably, ICMP is now faster than TCP and UDP by a factor of more than four, which is explained by ICMP running entirely in kernel space, while my collaboration infrastructure sends and receives packages from user space.

So what gives? Why does everybody know that TCP sucks for low latency? The issue is failure recovery. TCP is — clearly — just as “fast” (in terms of latency) as UDP as long as no IP packets get lost. So what happens if IP packets do get lost? In UDP’s case, nothing happens. The receiver doesn’t receive the packet, and the loss does not affect latency. In TCP’s case, the error recovery algorithm will notice that a packet was lost, duplicated, or sent out-of-order, and the receiver will request re-transmission of the bad packet (actually, TCP uses positive acknowledgment, but whatever). And because re-sending a bad packet takes at least a full round-trip between receiver and sender, it does indeed add to worst-case latency.

So that’s bad. Under failure conditions, TCP can have higher latency. But what’s the alternative? The sender (usually) does not send packets for funsies, but to communicate. And if packets get dropped or otherwise mangled, communication does not happen. Meaning, if some UDP sender needs to make sure that some bit of data actually arrives at the receiver, it has to implement some mechanism to deal with packet loss. And that will increase worst-case latency, just as it does for TCP. So the bottom line is: UDP has lower worst-case latency than TCP if and only if any individual piece of sent data does not matter. In other words: UDP has lower worst-case latency than TCP only when sending idempotent data, meaning data where it doesn’t matter if not everything arrives, or some data arrives multiple times, or packets arrive out-of-order, as long as a certain fraction of data arrives. Typical examples for this type of data are simple state updates in online games (an example discussed in the second article I linked), or audio packets for real-time voice chat. In most other cases using UDP does not actually help, or even hurts (the main point of the second article I linked). Even in online games, it is generally only okay if one of a player’s many mouse movement packets is lost, because the next one will update to the correct state (idempotent!), but if a button click packet gets lost and the player’s gun doesn’t shoot, there’ll be hell to pay.

So the common wisdom should actually be: If you want to send a stream of packets at low latency, and each subsequent packet will contain the full end state of your system, i.e., updates are idempotent as opposed to incremental, then use UDP. In all other cases, use TCP. And, generally speaking, don’t attempt a custom implementation of TCP’s failure correction in your UDP code, because it’s highly likely that the TCP developers did it better (and TCP runs in kernel space, to boot).

That’s for latency. What about sending high-volume data, in other words, what about bandwidth? Time for another experiment, again between my home PC and my UC Davis server. First, I sent a medium amount of data (100MB) over TCP, using a simple sender and receiver. This took 158 seconds, for an average bandwidth of 0.634MB/s. (Yes, I know. I am appropriately embarrassed by my Internet speed.) Next, I sent the same data over UDP, simply blasting a sequence of 74,899 datagrams of 1400 data bytes each over my home PC’s outgoing network interface. That took about 2.3s, for an average bandwidth of 43.28MB/s. Success! But oh wait. How many of those datagrams actually arrived at my server? It turns out that 95.88% of the datagrams I sent were lost en route. Oh well, I guess those data weren’t important anyway. 🙁

Seriously, though, the problem is that UDP, by design, does not do any congestion control. If the rate of sent datagrams at any time overwhelms any of the network links between sender and receiver, datagrams will be discarded silently. So we need to implement some form of traffic shaping ourselves. That’s not easy (there’s a reason TCP is the complex protocol that it is), and as a first approach, I simply calculated the average number of packets that were sent by my simple TCP sender per second, and set up a timer on the sender side to spread datagrams out to the same average rate. This ended up taking 160s (duh!), for a bandwidth of 0.623MB/s. At this rate, only 0.015% of datagrams were lost en route. Clearly not better than TCP, but then that’s expected if set up this way.

Next, I tried pushing the effective bandwidth up, by sending datagrams at increasingly higher rates. At 0.812MB/s on the sender side, 20.93% of datagrams were lost, for an effective bandwidth on the receiver side of 0.642MB/s, or 1.3% more than TCP’s. In a real bulk data protocol, the sender would have had to re-send those missing packets, so this is an upper limit on the bandwidth that could have been achieved. Trying even faster, with a sender-side bandwidth of 1.181MB/s, 44.29% of datagrams were lost, for a receiver-side bandwidth of 0.658MB/s, or 3.8% above TCP. And again, this is a loose upper limit. Any mechanism to re-send those dropped packets would have lowered effective end-to-end bandwidth.

From these numbers we can extrapolate that in the best case, where no datagrams are lost whatsoever, UDP can maybe transmit bulk data a few percent faster than TCP. (I said maybe, because in a situation with no packet loss, TCP wouldn’t have to spend time re-transmitting data, either). In any real situation, where IP packets are invariably lost, the necessary re-transmission overhead would have brought UDP back to about the same level as TCP. The price we pay for this tiny potential improvement (if it’s even there) is that we have to implement TCP’s failure correction and traffic shaping algorithms ourselves, in user space. Again, that’s generally not a good idea. The bottom line is the same as for latency: if you want to send data that doesn’t all have to arrive at the receiver, like real-time audio chat data, UDP is a good choice. In all other cases, use TCP.

Finally, let’s compare TCP and UDP bandwidth in the local case, where sender and receiver are on the same computer. Here we have a somewhat counter-intuitive result: UDP transmitted 100MB at a bandwidth of 450MB/s, with 0% packet loss as expected, while TCP transmitted at 890MB/s, almost twice as fast. Huh? The answer here is that TCP on a single host is directly mapped to UNIX’s pipe mechanism, meaning there is no traffic shaping or failure recovery, and that my test program was able to pass data to TCP in larger chunks, because it didn’t have to send individual datagrams (concretely, I sent 4096 bytes per system call for TCP, vs. 1400 bytes per system call for UDP). Fewer system calls, less time, higher bandwidth.

In summary: Is TCP really that slow? Answer: no, not at all. Under very specific circumstances, data transmitted over UDP can have lower worst-case latency, or potentially higher bandwidth, than the same data transmitted over TCP. If your data falls within those circumstances, i.e., if re-sending lost, mangled, or mis-ordered packets would not be helpful, like in idempotent state updates or in data streams that have built-in forward error correction or loss masking like real-time audio chat data, use UDP. In the general case, or if you don’t know for sure, use TCP.

7 thoughts on “Is TCP really that slow?

  1. It looks like there’s a problem with your bandwidth tests and it’s in the upload speed of your home connection. Trying to stuff more bytes in the pipe won’t work if the pipe isn’t wide enough. The bytes will just spill next to the pipe. That seems to be happening when your home router accepts more than 40 MB/s. It can only release 0.6 MB/s into your upload pipe.
    What happens when you do the bandwidth test the other way round, sending from UC Davis to your Home PC? I’m assuming the university has a high upload and your home connections has a high download. In that scenario the home router will be busy sending acknowledgement packets and that may decrease TCP performance compared to UDP.

    • “Trying to stuff more bytes in the pipe won’t work if the pipe isn’t wide enough.”

      Well, yeah. That’s the problem addressed by TCP traffic shaping / congestion control.

      “What happens when you do the bandwidth test the other way round, sending from UC Davis to your Home PC?”

      My home download bandwidth is around 10MB/s under the best circumstances. When uploading at 0.633MB/s, the bandwidth of the returning stream of TCP ACKs is 14KB/s. Meaning, with an available bandwidth of 0.633MB/s for ACKs from my home PC, the download bandwidth supported by TCP would be 46.3MB/s, or 4-5 times as much as I actually get. Meaning, I’m not download-limited by the TCP protocol. Turning the test around, I get 10.03MB/s over TCP (with an ACK stream bandwidth of 187KB/s).

      In principle it’s true: if the bandwidth asymmetry between upload and download is very large (more than about 55:1 based on my results), then the ACK stream itself could limit TCP’s transmission bandwidth.

  2. This UDP vs TCP thing is pretty well trod territory (for decades) by the games industry.

    I’d be suspicious of myself if I broke new ground by choosing TCP for the job I think you want to do in your framework?

    I didn’t really see you address the points presented here for example –

    https://www.gafferongames.com/post/udp_vs_tcp/

    He also has a pretty interesting piece on some work he did with Oculus in 2018 –

    https://www.gafferongames.com/post/networked_physics_in_virtual_reality/

    • First thing, I don’t think I’m breaking any new ground here, especially not when it comes to network game development. I ran across recurring unsupported assumptions about networking while researching OSC, and figured I’d try to gather some supporting (or not) data.

      The first article you linked makes good points, but there isn’t much data in it, in the sense of experiments and results. Its main line of thought is that for network games, most traffic does not have to arrive 100%, or in order, and the protocol will still work. I did address that here, and fully agree with it.

      Where we differ is what to do with the fraction of traffic that does have to arrive 100%. The article’s main point on that is that you shouldn’t mix TCP and UDP, but there’s no data to back that up. There is a link to an article “for more information,” but that link doesn’t actually back up the author’s claims. He says (paraphrased): “you shouldn’t mix TCP and UDP from the same application because TCP affects UDP,” but the link is about congestion on intermediate WAN nodes, where UDP and/or TCP traffic from your application has to contend with TCP traffic from other applications, and even other hosts. The link states that 80% of Internet traffic is TCP, meaning that even if your application uses only UDP, it still has to contend with overwhelming TCP traffic on the Internet. Meaning, the citation is a red herring. The entire article reads a little bit like “you should be using UDP for everything, but because you’ll have to implement all this failure correction stuff yourself, and it’s super hard, you should buy our product, which will do it for you.” That obviously does not mean the advice is wrong, and I assume the author knows much more about networking than I ever did, but without actual measurements it’s still just a claim.

      My VR toolkit contains a module to synchronize nodes on a local cluster, to run multi-screen VR environments such as CAVEs or tiled display walls. I use broadcast/multicast over UDP for that, due to the unique communication pattern (1 master sends N identical data streams to N slaves), and am getting almost N times the bandwidth of N independent point-to-point TCP connections (no surprise, as there’s a single shared network trunk). The main issue is “almost.” Between a master and one slave, I am getting less bandwidth over my custom UDP protocol than over TCP, and the main reason — supported by timing analysis — is that all my protocol processing is done in user space, by necessity. This means the latency of my code’s responses is limited by the granularity of the operating system’s time slices. In-kernel TCP can and does reply to an incoming network packet immediately in an interrupt handler, while user-space code has to wake up from sleep first and can only react during the next time slice — unless the code uses busy-waiting, which I think is not a good idea. Hence my warning against re-implementing TCP’s mechanisms in user space.

      I am not a networking wizard, and there are most probably ways to do it better than I did, but I would like to see some actual data comparing performance of a standard TCP connection and a UDP connection with user-space reliability bolted on before I come around on that point. The data I collected and showed here indicate that there isn’t much headroom.

      There’s one other point the article glosses over a bit, and that is that TCP’s re-transmission will halt all communication until a failure is resolved, which would in fact be bad. But that’s only true if all traffic goes over a single connection. For streams that need to be in order, halting until a failure is resolved is the correct (and only) approach. And if there is other traffic that does not need to be synched with a serial stream, then it should be sent over a different connection (say UDP), in which case it won’t be delayed by the interruption in the serial stream. Now it is true that sometimes there may be multiple independent logical streams, which all have to be serial in themselves, but don’t need to be synchronized with each other, where multiplexing them into a single TCP connection could stall all of them if there is a failure in one of them. But that seems like another very special circumstance to me.

  3. A very good and detailed article on your part, also the two articles you refer to are very illustrative of what can be optimized in a network system designed for real-time interaction and what can be done for the protocol to use with its advantages and disadvantages depending on the case. I wanted to know your opinion regarding the issue of the latency of the graphic information of the avatars since the first versions of VRUI took into account details such as the transmission of the 3D model resulting from the merger of the two point-clouds generated by the two Kinects that you used in those tests
    http://idav.ucdavis.edu/~okreylos/ResDev/Kinect

    but in the case of using several depth cameras with different “capture nodes” in a network, (Multi-sensor temporal alignment via the LAN-based Precision Time Protocol), the usual thing is to create a PTP / IEEE 1588 node arrangement where a camera frame synchronization is performed by software (ptp4l, ptpd) as in this project:

    https://github.com/VCL3D/VolumetricCapture

    Could a node array with the necessary hardware (AVB / TSN switches and network interfaces) give a competitive advantage in terms of latency and quality of the generated point cloud/Avatar Model?

  4. There is a lot to unpack in your response here but let me give it a go –

    One thing I note is it seems like you have some Apples and Oranges issues here? Local Lan wall display sync is not a multiplayer game over a flakey Internet?

    “””So from this point on and for the rest of this article series, I assume you want to network an action game. You know, games like Halo, Battlefield 1942, Quake, Unreal, CounterStrike and Team Fortress.”””

    As he says in the article.

    If you aren’t worried about contention across crowded intermediary routers you have a different set of tradeoffs?

    Also remember the bit about Nagle’s algorithm he mentions. Small packets dont leave the host unless the TCP socket has enough packet to build vs whats in flight. If you turn that off, that aggregate throughput you get from using TCP goes away.

    I suspect thats exactly why OSC uses UDP in the first place? “Hey play this note” doesn’t need to wait for anything. That being said I don’t think people have much success playing music together over the internet because the timing is too tight.

    What are your goals? Is collaboration in your framework 3 people in a lab or 3 people at home in Hamburg, Vancouver and Los Angeles?

    Is there a central server that syncs or is it P2P? Is it a mix?

    Just as a note there is currently an effort to replace TCP (for some uses) with HTTP/3 (Quic), but yes they have congestion algorithms added –full-time
    https://blog.cloudflare.com/http-3-vs-http-2/

    Not a slamdunk for replacing http/TCP with quic there but thats a misleading way to think of it. TCP works really well but to improve it much/at all you need to use UDP because the Internet doesn’t speak anything else.

    I think you are right suspect that mixing TCP and UDP for different purpose might be fine if you are tasteful and don’t have interleaved timing/syncing needs. Saturating the link with a tcp file transfer while you try to sync hand motions might not work out well either. I agree that there is room for exploration and my very quick scan of the router paper doesn’t preclude it for VR collaboration.

    Just spitballing I think a common vr collaboration space is only kind of a game like environment.

    It’s also the case that you probably just want to make a list of the things that need to be synchronized over time and their relative priorities in latency.

    GLTFs and Jpegs for scene construction? Ok yeah tcp, probably http even.

    As you said voice? Yeah UDP low latency and solved problem.

    Head and hand sync? Hmm yeah I think UDP because you want that low latency because of the communicative content between humans.

    Drawing on one of those Jpegs you are looking at together? Again probably low latency UDP as you want that pencil point to not lag as much as possible.

    Maybe one way to explore this would be implementing https://www.gafferongames.com/post/state_synchronization/ in both UDP and TCP (probably best by finding someone with a library to build out) and testing with packet mangling intermediaries https://wiki.linuxfoundation.org/networking/netem
    to simulate packet loss and reordering.

    What looks best? What kind of latencies in syncing state? Run it for a week from Germany to Los Angeles and see its highs and lows?

    Oh and do this all in Rust so I can follow along, reuse and play with it easily 🙂

    https://github.com/amethyst/laminar

    https://github.com/skywind3000/kcp/blob/master/README.en.md

    Oh and you’ll want this at some point https://github.com/coturn/coturn

Leave a Reply to okreylos Cancel reply

Your email address will not be published. Required fields are marked *