ETag Survey

In the last few weeks I’ve had conversations with a couple of different people about their sites not using ETags correctly. This led me to wonder how many of the top sites on the web have a similar problem.

I downloaded the list of top U.S. sites from Quantcast and wrote a simple PHP script to see which of them included an ETag header in their HTTP response. I ran checks on the top 1,000 sites from that list. Of those 136 included an ETag header in the response. Here are some of the interesting points from those 136:

- 9 of them indicated they were weak validators ( W/ )
- 2 had values of “”
- 1 had a completely empty value
- most used double quotes around the entire value, 6 didn’t use quotes at all
- 1 used a date value of “Sun, 17 Jul 2011 17:14:09 -0400″

Each response was checked for an ETag header, if it had one then another request was sent with the If-None-Match header, using the value of the ETag. For sites that are using ETags correctly they will detect this and send back a “304 Not Modified” status. Ultimately however I settled on four possible results for sites using ETags:

- WORKS ( ETAG_WORKS ) : does exactly what it should, returning “304 Not Modified” when appropriate
- WORKS, sort of, web server farm with different ETag values ( ETAG_WORKS_FARM ) : only does the right thing if you happen to hit the same backend web server repeatedly, which you can’t really control
- FAILS, the ETag value changes ( ETAG_FAILS_CHANGE ) : this is a failure where the site returns a different ETag value on every request, making it impossible to ever get a match
- FAILS, ignored If-None-Match ( ETAG_FAILS_IGNORE ) : the site consistently returns the same ETag value, but always forces a re-download of the resource even when a correct If-None-Match value is provided

The server farm situation is an interesting one. To test for that each time an ETag check request fails for a site I send another dozen requests to see if any of those succeed. That isn’t a perfect solution, all of the requests come from the same IP in a short period of time, so it is reasonable that some sites will send all of those requests to the same back end server in their farm. That said, this technique did get a few hits and was very easy to implement.

Here are the numbers for each of the possible categories, remember this is out of a total of 136:

- ETAG_WORKS : 54 ( 39.7% )
- ETAG_WORKS_FARM : 11 ( 8% )
- ETAG_FAILS_CHANGE : 24 ( 17.6% )
- ETAG_FAILS_IGNORE : 47 ( 34.5% )

Not exactly stellar results. More than half of the sites using ETags completely fail at using them correctly. To make matters even worse, the first site to use ETags correctly was ranked number #62 on the Quantcast list. There were 8 other sites ranked higher than that ( #5, #17, #35, #38, #48, #49, #51, and #55 ) that all failed. The good news in all of this: there is plenty of room for improvement.

The code (which is very basic) for running the survey is available at https://github.com/josephscott/etag-survey. That also contains the Quantcast list I used (downloaded 6 Sep 2011) and the results of the run (also dated 6 Sep 2011).

I need to look at the code for httparchive.org and see if this is something that could be easily added to their test suite. I’m hoping that the number of sites correctly using ETags will go up over time.

6 Comments

  1. This was one of my favorite pieces of your presentation today – thanks for the great conference!

  2. Hi Joseph,
    as I have read on many places the use of ETAG is not quite recommended these days, instead the if modified thing is more in use. I would like to know your views on it.

    - Thanks

  3. If your server admins are incapable of configuring Etags correctly then it is is better to turn them off entirely. No Etags are better than broken Etags.

    But the best solution is still to use Etags correctly.

  4. So, what is the proper way to do an Etag? I can’t seem to find this information in an “Etag for Dummies” format. Where does it go? How do you ensure you have done it correctly?

    Thanks!

  5. The Etag has no required definition as far as how to generate one. In order for it to be useful for caching though it does need to be consistent.

  6. Is it a piece of code that goes in the .htaccess file or is it set up in the httpd.conf file? I’m at the very beginning of knowledge in this area and don’t see some easy “turn e-tags on” button. I’m hosted through GoDaddy and they tell me that it’s a piece of code that goes in the .htaccess but that’s all they could tell me. I have read that they are valuable to have – when done right – but can’t seem to find out how to actually get it to be on the pages of my site.

Leave a Reply

Your email address will not be published.

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

© 2014 Joseph Scott

Theme by Anders NorenUp ↑