Two months ago Google announced Brotli, a new compression format:
Brotli is a whole new data format. This new format allows us to get 20–26% higher compression ratios over Zopfli. In our study ‘Comparison of Brotli, Deflate, Zopfli, LZMA, LZHAM and Bzip2 Compression Algorithms’ we show that Brotli is roughly as fast as zlib’s Deflate implementation. At the same time, it compresses slightly more densely than LZMA and bzip2 on the Canterbury corpus. The higher data density is achieved by a 2nd order context modeling, re-use of entropy codes, larger memory window of past data and joint distribution codes. Just like Zopfli, the new algorithm is named after Swiss bakery products. Brötli means ‘small bread’ in Swiss German.
Compression is a big deal for web performance, being able to send the same file with fewer bytes is a big win.
information about the client TCP connection; available on systems that support the TCP_INFO socket option
Easy to log this information. From there: periodically parse Nginx logs looking for client IP and RTT data, GEO IP lookup on the client, compute the median RTT value for each country/state, color code countries from highest to lowest RTT, then use Google Maps for the display.
You’ll have yourself a nice page showing the RTT values across the world for visitors to your site.
Back in September I spun up a private instance of webpagetest ( WPT ). One of the first things I did was run near optimal condition tests. I wanted to see what the lower bound were for a few performance tests. The test system was very close to one of our data centers ( less than 1ms ping times ), so I configured tests to use Chrome as the browser, with no traffic shaping.
Like a kid with a new toy on Christmas morning I started running tests against this private WPT instance. Quickly something odd came up. In some cases we were seeing requests for very small ( sometimes less than 1kb ) resources take much longer than they should have. And by “much longer”, I mean these were 3 to 4 times slower than requests for larger resources.
Long story short, these slower small requests were seeing ~200ms delays in time to first byte ( TTFB ). But it only happened with SPDY, with compression enabled, for small files ( my tests showed 1,602 bytes or smaller ), when the small file was the first resource requested in the SPDY connection. Once I was able to list all of the variables that needed to be in place it was very simple to reproduce the problem using WPT.
Some of you will look at the mention of a ~200ms delay and immediately recognize this as a delayed ack issue:
When a Microsoft TCP stack receives a data packet, a 200-ms delay timer goes off. When an ACK is eventually sent, the delay timer is reset and will initiate another 200-ms delay when the next data packet is received.
( Note that the WPT tester system in our instance uses Windows, so that we can test Internet Explorer )
But that is why Nginx has a tcp_nodelay option. Unfortunately it wasn’t being applied when these variables came together in a SPDY connection. I started a thread in the WPT forums about this and we all basically agreed that this was the issue.
The systems team at Automattic reached out the Nginx team, describing what we were observing and how to reproduce it. They sent back a patch, which I ran through my tests and confirmed that it fixed the problem. The patch led to the commit I mentioned above. And now that change is part of the Nginx 1.7.8 release.
Here is what this looks like in action. First, a WPT waterfall graph using Nginx 1.7.7:
That TTFB of 212ms is significantly slower than it should be. Compare that with the same test conditions using Nginx 1.7.8:
A TTFB of 15ms is inline with what I expected to see. Going from 212ms to 15ms is a 14x improvement! The total request time dropped from 266ms to 69ms, a 3.8X improvement. I’ll take gains like that anytime.
Reproducing This Yourself
I ran numerous tests during this process. To make running new tests easier I put together a simple script to take care of the Nginx build and configuration:
For each test I’d spin up a new DigitalOcean VM with Ubuntu 14.04 LTS. The build script would complete in a few minutes and then I’d run new WPT tests.
Conveniently it turns out that the default Welcome to nginx! page is small enough to trigger the ~200ms delay.
With all of that in place you can run a test at webpagetest.org to see this in action. I’ve been using the following test config:
– Test Location: Dulles, VA – Browser: Chrome – Connection: Native Connection ( No Traffic Shaping ) – “Ignore SSL Certificate Errors” ( under the Advanced tab ) — this is need because I’ve been using a self signed cert
The “Dulles, VA” location has a fast enough route to DigitalOcean “New York 3” that you can still observe the ~200ms TTFB difference between Nginx 1.7.7 and 1.7.8.
A big thank you to the Nginx team for fixing this.
Zack Tollman suggested I try out SPDY with my updated Nginx install. While I’m sad at the idea of giving up a plain text HTTP API, I was curious to see what SPDY looked like on this site.
I was disappointed with the results. The fastest page load time out of 5 runs without SPDY was 1.039 s. With SPDY the fastest result was 1.273 s. I then did several more runs of the same test with SPDY enabled to see if any of them could get close to the 1.0 s base line. None of them did, most came in close to 2 seconds. I had honestly expected to see SPDY perform better. That said this type of testing is not particularly rigorous, so take these numbers with a sufficiently large grain of salt.
Given the initial poor showing of SPDY in these tests I’m going to leave it turned off for now.
After digging through the nginx source code, one stumbles onto this gem. Turns out, any nginx version prior to 1.5.6 has this issue: certificates over 4KB in size incur an extra roundtrip, turning a two roundtrip handshake into a three roundtrip affair – yikes. Worse, in this particular case we trigger another unfortunate edge case in Windows TCP stack: the client ACKs the first few packets from the server, but then waits ~200ms before it triggers a delayed ACK for the last segment. In total, that results in extra 580ms of latency that we did not expect.
I’ve been using Nginx 1.4.x from the Ubuntu package collection on this site. A few webpagetest.org runs showed that HTTPS negotiation was taking more than 300ms on the initial request. After updating to Nginx 1.5.13 more tests showed HTTPS negotiation was down around 250ms.
The 50ms savings isn’t nearly as dramatic as the worst case scenario described in the quote above, but I’ll take it.
let’s now turn to the practical matter of picking and tuning the server to deliver the best results. One would hope that the default “out of the box” experience for most servers would do a good job… unfortunately, that is not the case. Let’s take a closer look nginx
In the simplest terms, TLS involves more work. The current realities of securing communications means we don’t have a good way to avoid doing that additional work, indeed we will be doing it more often than we ever have before. The end result is that we need to spend more time thinking about how to optimize the HTTPS experience for all users.