#VelocityConf: Bits on the Wire

Talk link

This was a really cool talk that walked down the network stack to really show how abstractions make our work possible.

The first thing to understand is that there are tons of abstractions that we rely on every day. Unfortunately, abstractions are usually not perfect, they leak concepts of their implementations into the interfaces they expose. This is a common problem in most software. The other problem is that abstractions, by producing more layers of code, are typically slower and less efficient.

He also made the statement that abstractions create problems you can't see. I don't completely agree with that. This is a rather long argument, so I may write that post later tonight.

Now, when you start dealing with abstractions and distributed computing, it is very important to understand how that changes the problem. One of the teams near me used to have a sign on their door with SOA fallacies. They included things like: "The network is reliable." He recommended a paper that I haven't read called "A Note on Distributed Computing". I will have to look that up. In any case, the general upshot is that the partial failure modes and concurrency in distributed systems can cause headaches for anyone used to a localized world. I cringe to remember the days of working with WebSphere and getting the "success: maybe" log messages.


As we've learned, especially when trying to write a REST service, HTTP is hard. It's a really complex spec that was apparently rushed in the first place. The new version of the HTTP 1.1 spec is apparently going to be better-constructed and in 6 parts rather than a single 170-something page document.

From the wire perspective, the key to HTTP is the headers. With headers, I have my second new tool of the conference: REDbot. This thing is cool. It will tell you everything you're doing wrong with your headers. I think we need one of these at my job. There is also htracr, which will show you the actual network impact of your HTTP requests. As a colleague reminded me this morning, the waterfalls produced in most of our tools are how the browser percieves things, not necessarily how the network behaves.

Problems with HTTP

Other than its paralyzing complexity, HTTP has problems created by some of its "solutions". The best example is pipelining. Pipelining has been part of the spec for a long time, but no one uses it due to the fact that the client cannot know the best way to use the connection. Therefore, it is absurd to think that the client should direct the pipelining, which is what the spec requires.


Obviously, there are good and bad intermediaries when dealing with HTTP. The good ones are proxies and gateways. These are configured by the client or the server for very specific reasons understood by the party configuring them.

On the other hand, the bad ones are prevalent and occasionally malicious. They include virus scanners and content modification systems. The fact that there are prevalent specs around content modficiation scares me. Now, I'm really starting to think everything should be done over TLS.


He also talked at some length about DNS, which was interesting but not necessarily good material. The cool things I found were a tool called dig, alternative DNS services (OpenDNS, Google, and Comodo), and how terrible our ISP routers and modems are. I also learned how easy it is to spoof DNS responses and requests, both of which can be used for very malicious attacks.

TCP and IP

The lower-level stuff was all interesting, but I think the slides are pretty good on those. The one thing that I found really interesting was around the congestion windows for TCP. The fact that packet loss is used as the determination for when a network is congested must be a nightmare for mobile devices. Even wifi must suffer from this at times. It seems like we need a better mechanism for determining when the network is congested.