iOS6 Safari Caching POST Responses

With the release of iOS6 mobile Safari started caching POST responses. Mark Nottingham talks through the related RFCs to see how this lines up with the HTTP specs. Worth a read for the details, here is the conclusion:

even without the benefit of this context, they’re still clearly violating the spec; the original permission to cache in 2616 was contingent upon there being explicit freshness information (basically, Expires or Cache-Control: max-age).

So, it’s a bug. Unfortunately, it’s one that will make people trust caches even less, which is bad for the Web. Hopefully, they’ll do a quick fix before developers feel they need to work around this for the next five years.

Over the years I’ve run across a handful of services and applications that claim to be able to cache HTTP POST responses. In every case that turned out to be a bad decision.

Google’s plusone.js Doesn’t Support HTTP Compression

I was surprised to see that Google’s plusone.js doesn’t support HTTP compression. Here is a quick test with
curl -v --compressed https://apis.google.com/js/plusone.js > /dev/null

Request Headers:

> GET /js/plusone.js HTTP/1.1
> User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8r zlib/1.2.3
> Host: apis.google.com
> Accept: */*
> Accept-Encoding: deflate, gzip

Response Headers:

HTTP/1.1 200 OK
< Content-Type: text/javascript; charset=utf-8 < Expires: Fri, 18 Nov 2011 02:35:20 GMT < Date: Fri, 18 Nov 2011 02:35:20 GMT < Cache-Control: private, max-age=3600 < X-Content-Type-Options: nosniff < X-Frame-Options: SAMEORIGIN < X-XSS-Protection: 1; mode=block < Server: GSE < Transfer-Encoding: chunked 

You'll notice there is no Content-Encoding: gzip header in the response.

We'll have to get Steve Souders to pester them about that.

Timing Details With cURL

Jon’s recent Find the Time to First Byte Using Curl post reminded me about the additional timing details that cURL can provide.

cURL supports formatted output for the details of the request ( see the cURL manpage for details, under “-w, –write-out <format>” ). For our purposes we’ll focus just on the timing details that are provided.

Step one: create a new file, curl-format.txt, and paste in:

\n
            time_namelookup:  %{time_namelookup}\n
               time_connect:  %{time_connect}\n
            time_appconnect:  %{time_appconnect}\n
           time_pretransfer:  %{time_pretransfer}\n
              time_redirect:  %{time_redirect}\n
         time_starttransfer:  %{time_starttransfer}\n
                            ----------\n
                 time_total:  %{time_total}\n
\n

Step two, make a request:

curl -w "@curl-format.txt" -o /dev/null -s http://wordpress.com/

What this does:

  • -w "@curl-format.txt" tells cURL to use our format file
  • -o /dev/null redirects the output of the request to /dev/null
  • -s tells cURL not to show a progress meter
  • http://wordpress.com/ is the URL we are requesting

And here is what you get back:

            time_namelookup:  0.001
               time_connect:  0.037
            time_appconnect:  0.000
           time_pretransfer:  0.037
              time_redirect:  0.000
         time_starttransfer:  0.092
                            ----------
                 time_total:  0.164

Jon was looking specifically at time to first byte, which is the time_starttransfer line. The other timing details include DNS lookup, TCP connect, pre-transfer negotiations, redirects (in this case there were none), and of course the total time.

The format file for this output provides a reasonable level of flexibility, for instance you could make it CSV formatted for easy parsing. You might want to do that if you were running this as a cron job to track timing details of a specific URL.

For details on the other information that cURL can provide using -w check out the cURL manpage.

ETag Survey

In the last few weeks I’ve had conversations with a couple of different people about their sites not using ETags correctly. This led me to wonder how many of the top sites on the web have a similar problem.

I downloaded the list of top U.S. sites from Quantcast and wrote a simple PHP script to see which of them included an ETag header in their HTTP response. I ran checks on the top 1,000 sites from that list. Of those 136 included an ETag header in the response. Here are some of the interesting points from those 136:

– 9 of them indicated they were weak validators ( W/ )
– 2 had values of “”
– 1 had a completely empty value
– most used double quotes around the entire value, 6 didn’t use quotes at all
– 1 used a date value of “Sun, 17 Jul 2011 17:14:09 -0400″

Each response was checked for an ETag header, if it had one then another request was sent with the If-None-Match header, using the value of the ETag. For sites that are using ETags correctly they will detect this and send back a “304 Not Modified” status. Ultimately however I settled on four possible results for sites using ETags:

WORKS ( ETAG_WORKS ) : does exactly what it should, returning “304 Not Modified” when appropriate
WORKS, sort of, web server farm with different ETag values ( ETAG_WORKS_FARM ) : only does the right thing if you happen to hit the same backend web server repeatedly, which you can’t really control
FAILS, the ETag value changes ( ETAG_FAILS_CHANGE ) : this is a failure where the site returns a different ETag value on every request, making it impossible to ever get a match
FAILS, ignored If-None-Match ( ETAG_FAILS_IGNORE ) : the site consistently returns the same ETag value, but always forces a re-download of the resource even when a correct If-None-Match value is provided

The server farm situation is an interesting one. To test for that each time an ETag check request fails for a site I send another dozen requests to see if any of those succeed. That isn’t a perfect solution, all of the requests come from the same IP in a short period of time, so it is reasonable that some sites will send all of those requests to the same back end server in their farm. That said, this technique did get a few hits and was very easy to implement.

Here are the numbers for each of the possible categories, remember this is out of a total of 136:

ETAG_WORKS : 54 ( 39.7% )
ETAG_WORKS_FARM : 11 ( 8% )
ETAG_FAILS_CHANGE : 24 ( 17.6% )
ETAG_FAILS_IGNORE : 47 ( 34.5% )

Not exactly stellar results. More than half of the sites using ETags completely fail at using them correctly. To make matters even worse, the first site to use ETags correctly was ranked number #62 on the Quantcast list. There were 8 other sites ranked higher than that ( #5, #17, #35, #38, #48, #49, #51, and #55 ) that all failed. The good news in all of this: there is plenty of room for improvement.

The code (which is very basic) for running the survey is available at https://github.com/josephscott/etag-survey. That also contains the Quantcast list I used (downloaded 6 Sep 2011) and the results of the run (also dated 6 Sep 2011).

I need to look at the code for httparchive.org and see if this is something that could be easily added to their test suite. I’m hoping that the number of sites correctly using ETags will go up over time.

Performance Trends For Top Sites On The Web

Steve Souders posted an update on the HTTP performance trends for top sites, based on data gathered via http://httparchive.org/. Here are the bottom line numbers:

Here’s a recap of the performance indicators from Nov 15 2010 to Aug 15 2011 for the top ~13K websites:

  • total transfer size grew from 640 kB to 735 kB
  • requests per page increased from 69 to 76
  • sites with redirects went up from 58% to 64%
  • sites with errors is up from 14% to 25%
  • the use of Google Libraries API increased from 10% to 14%
  • Flash usage dropped from 47% to 45%
  • resources that are cached grew from 39% to 42%

I was surprised by the total transfer size increase. If you followed that trend on a weekly basis, every Friday for the last 9 months you added another 2.6 kB to the total transfer size of your site. Not much for any given week, but it adds up fast.

Cookies That Won’t Die: evercookie

User tracking on the web is an interesting field. There are projects like evercookie that provide insight into the different techniques that are available given todays web client technologies.

The methods listed by evercookie that I thought were particularly curious are:

– Storing cookies in RGB values of auto-generated, force-cached PNGs using HTML5 Canvas tag to read pixels (cookies) back out

– Storing cookies in HTTP ETags

And of course the various methods used by evercookie all cause the original HTTP cookie to re-spawn.

The code is available at https://github.com/samyk/evercookie and is worth a look if you are interested in this sort of thing. The evercookie page has descriptions of how some of the techniques work, along with a sample piece of code to get started with.

HTTP Basic Auth with httplib2

While working on pressfs I ran into an issue with httplib2 using HTTP Basic Authentication.

Here is some example code:

import httplib2

if __name__ == '__main__' :
    httplib2.debuglevel = 1

    h = httplib2.Http()
    h.add_credentials( 'username', 'password' )

    resp, content = h.request( 'http://www.google.com/', 'GET' )

If you run this you’ll notice that httplib2 doesn’t actually include the HTTP Basic Auth details in the request, even though the code specifically asks it to do so. By design it will always make one request with no authentication details and then check to see if it gets an HTTP 401 Unauthorized response back. If and only if it gets a 401 response back will it then make a second request that includes the authentication data.

Bottom line, I didn’t want to make two HTTP requests when only one was needed (huge performance hit). There is no option to force the authentication header to be sent on the first request, so you have to do it manually:

import base64
import httplib2

if __name__ == '__main__' :
    httplib2.debuglevel = 1

    h = httplib2.Http()
    auth = base64.encodestring( 'username' + ':' + 'password' )

    resp, content = h.request(
        'http://www.google.com/',
        'GET',
        headers = { 'Authorization' : 'Basic ' + auth }
    )

Watching the output of this you’ll see the authentication header in the request.

Someone else already opened an issue about this ( Issue 130 ), unfortunately Joe Gregorio has indicated that he has no intention of ever fixing this :-(

On the up side, working around this deficiency only takes a little bit of extra code.

On HTTP Load Testing – The Hello Word Test

I came across On HTTP Load Testing via Simon Willison this morning. It makes some good points, but I want to pick on just one: 7. Do More than Hello World :

Finding out how quickly your implementation can serve a 4-byte response body is an interested but extremely limited look at how it performs. What happens when the response body is 4k — or 100k — is often much more interesting, and more representative of how it’ll handle real-life load.

Another thing to look at is how it handles load with a large number — say, 10,000 — of outstanding idle persistent connections (opened with a separate tool). A decent, modern server shouldn’t be bothered by this, but it causes issues more often than you’d think.

I both disagree and agree with this. The part I disagree with is that testing your implementation against a 4-byte response body is not helpful. I contend that it is. If you know that you need to get X from the new server that you are testing, then the first thing I’d test is the maximum performance, which means doing the least amount of work. For a web server that may mean serving a static file that only contains ‘Hello World!’ (13 bytes).

If I can’t get a web server to reach the performance level of X using the static hello world file, then there is no way it is magically going to reach it after adding on several layers of additional work. That is why measuring the peak possible performance is important, you immediately determine if your need of X is even possible.

If your test results are over X, great, then start adding on more/larger work loads, as suggested in the post. If your tests are under X then you need to consider some server level changes. That might mean hardware changes, operating system and software tuning, or all of the above.

I had originally intended to leave this as a comment on On HTTP Load Testing, but it requires me to create an account on the site, which I have no interest in doing.

IE9 Preconnect

I was playing around on WebSiteTest today – trying out it new IE9 test feature, and I noticed something new that IE9 does: preconnect.

What is preconnect?  Preconnect is making a connection to a site before you have a request to use that connection for.  The browser may have an inkling that it will need the connection, but if you don’t have a request in hand yet, it is a speculative request, and therefore a preconnect.

via Mike’s Lookout » Blog Archive » The Era of Browser Preconnect

Interesting behavior from IE9, making connections to sites in the hope that there will be additional resources that need to be downloaded. If IE9 becomes widely adopted (which seems likely) then taking this behavior into consideration when building a site may be useful.