After the keynotes today, I decided to just dive deep and go to a talk about how DNS resolution affects website performance. There are some really interesting things that I learned about how the different operating systems actually deal with DNS resolution. It explains a lot about how slowly things fail when your network goes completely down.
First, I want to say how absolutely terrible it is that Windows resolves DNS entries at half the speed of Mac and Unix. We're talking 600ms vs. 200-300ms. That's pretty insane. Part of that seems to be due t othe way that failures are handled and how IPv6 integrates into the equation.
It seems that when doing standard DNS lookups in an IPv4 world or IPv6 world (but not dual stack), a complete resolution failure is going to take at least 20 seconds, no matter what platform you are using. This is largely because these are the timeouts set, which allow the operating system to hit 3 DNS servers and then try the first one again. All of this is done serially, so it can be pretty painful with the various backoffs.
When you add a dual stack with both IPv4 and IPv6 into the mix, different operating systems will behave differently. Except on Mac, IPv6 will be preferred. In Windows the IPv6 version is requested first, requesting the IPv4 record only if the IPv6 record fails. Mac at least requests both in parallel and will use whichever it thinks is fastest. On the other hand, that doesn't encourage people to use IPv6 very much, does it?
Round robin can add entire new problems once the connection is being established. Trying to determine which host to use can be tricky as they may fail. Failover to the next option can take a very long time.
The short takeaway is that DNS resolution is not a trivial part of the request. If you're going for sub-1-second latencies, this can eat a significant portion of your perceived latency before you even know what the user wants.