Tag: http (page 2 of 3)

Cookies That Won’t Die: evercookie

User tracking on the web is an interesting field. There are projects like evercookie that provide insight into the different techniques that are available given todays web client technologies.

The methods listed by evercookie that I thought were particularly curious are:

- Storing cookies in RGB values of auto-generated, force-cached PNGs using HTML5 Canvas tag to read pixels (cookies) back out

- Storing cookies in HTTP ETags

And of course the various methods used by evercookie all cause the original HTTP cookie to re-spawn.

The code is available at https://github.com/samyk/evercookie and is worth a look if you are interested in this sort of thing. The evercookie page has descriptions of how some of the techniques work, along with a sample piece of code to get started with.

HTTP Basic Auth with httplib2

While working on pressfs I ran into an issue with httplib2 using HTTP Basic Authentication.

Here is some example code:

import httplib2

if __name__ == '__main__' :
    httplib2.debuglevel = 1

    h = httplib2.Http()
    h.add_credentials( 'username', 'password' )

    resp, content = h.request( 'http://www.google.com/', 'GET' )

If you run this you’ll notice that httplib2 doesn’t actually include the HTTP Basic Auth details in the request, even though the code specifically asks it to do so. By design it will always make one request with no authentication details and then check to see if it gets an HTTP 401 Unauthorized response back. If and only if it gets a 401 response back will it then make a second request that includes the authentication data.

Bottom line, I didn’t want to make two HTTP requests when only one was needed (huge performance hit). There is no option to force the authentication header to be sent on the first request, so you have to do it manually:

import base64
import httplib2

if __name__ == '__main__' :
    httplib2.debuglevel = 1

    h = httplib2.Http()
    auth = base64.encodestring( 'username' + ':' + 'password' )

    resp, content = h.request(
        'http://www.google.com/',
        'GET',
        headers = { 'Authorization' : 'Basic ' + auth }
    )

Watching the output of this you’ll see the authentication header in the request.

Someone else already opened an issue about this ( Issue 130 ), unfortunately Joe Gregorio has indicated that he has no intention of ever fixing this :-(

On the up side, working around this deficiency only takes a little bit of extra code.

On HTTP Load Testing – The Hello Word Test

I came across On HTTP Load Testing via Simon Willison this morning. It makes some good points, but I want to pick on just one: 7. Do More than Hello World :

Finding out how quickly your implementation can serve a 4-byte response body is an interested but extremely limited look at how it performs. What happens when the response body is 4k — or 100k — is often much more interesting, and more representative of how it’ll handle real-life load.

Another thing to look at is how it handles load with a large number — say, 10,000 — of outstanding idle persistent connections (opened with a separate tool). A decent, modern server shouldn’t be bothered by this, but it causes issues more often than you’d think.

I both disagree and agree with this. The part I disagree with is that testing your implementation against a 4-byte response body is not helpful. I contend that it is. If you know that you need to get X from the new server that you are testing, then the first thing I’d test is the maximum performance, which means doing the least amount of work. For a web server that may mean serving a static file that only contains ‘Hello World!’ (13 bytes).

If I can’t get a web server to reach the performance level of X using the static hello world file, then there is no way it is magically going to reach it after adding on several layers of additional work. That is why measuring the peak possible performance is important, you immediately determine if your need of X is even possible.

If your test results are over X, great, then start adding on more/larger work loads, as suggested in the post. If your tests are under X then you need to consider some server level changes. That might mean hardware changes, operating system and software tuning, or all of the above.

I had originally intended to leave this as a comment on On HTTP Load Testing, but it requires me to create an account on the site, which I have no interest in doing.

HTTP Response Flow Chart

HTTP response flow chart – just in case you were wondering how to reply.

IE9 Preconnect

I was playing around on WebSiteTest today – trying out it new IE9 test feature, and I noticed something new that IE9 does: preconnect.

What is preconnect?  Preconnect is making a connection to a site before you have a request to use that connection for.  The browser may have an inkling that it will need the connection, but if you don’t have a request in hand yet, it is a speculative request, and therefore a preconnect.

via Mike’s Lookout » Blog Archive » The Era of Browser Preconnect

Interesting behavior from IE9, making connections to sites in the hope that there will be additional resources that need to be downloaded. If IE9 becomes widely adopted (which seems likely) then taking this behavior into consideration when building a site may be useful.

gzip support for Amazon Web Services CloudFront

With the recent announcement of Custom Origin support in CloudFront it is now possible to use the standard HTTP Accept-Encoding method for serving gzipped content if you are using a Custom Origin. Although not specifically mentioned in the release announcement you can verify this in the Custom Origins Appendix of the CloudFront Developer Guide. CloudFront will now forward the Accept-Encoding HTTP header to your origin server where you can ensure the appropriate content is served based on the supported encodings. CloudFront will then cache multiple versions of this content, the uncompressed version and the gzipped version and serve these to clients depending on the value of their Accept-Encoding header for all future requests.

via gzip support for Amazon Web Services CloudFront – nomitor.

Charles Web Debugging Proxy

Charles – cross platform HTTP/HTTPS proxy for debugging web requests on your own system. I wonder how this compares to Fiddler.

User Agent Sniffing at Google Libraries CDN

I recently took a closer look at Google Libraries, their content delivery network (CDN) for various Javascript libraries, and HTTP compression. I started with a simple test:

curl -O -v --compressed http://ajax.googleapis.com/ajax/libs/jquery/1.4.3/jquery.min.js

This downloads a minified version of jQuery 1.4.3, with --compressed, which means I’d like the response to be compressed. The HTTP request looked like:

> GET /ajax/libs/jquery/1.4.3/jquery.min.js HTTP/1.1
> User-Agent: curl/7.16.4 (i386-apple-darwin9.0) libcurl/7.16.4 OpenSSL/0.9.7l zlib/1.2.3
> Host: ajax.googleapis.com
> Accept: */*
> Accept-Encoding: deflate, gzip
> 

The response from Google was:

< HTTP/1.1 200 OK
< Content-Type: text/javascript; charset=UTF-8
< Last-Modified: Fri, 15 Oct 2010 18:25:24 GMT
< Date: Fri, 29 Oct 2010 03:27:16 GMT
< Expires: Sat, 29 Oct 2011 03:27:16 GMT
< Vary: Accept-Encoding
< X-Content-Type-Options: nosniff
< Server: sffe
< Cache-Control: public, max-age=31536000
< Age: 145355
< Transfer-Encoding: chunked
< 

I was surprised that there was no Content-Encoding: gzip header in the response, meaning the response was NOT compressed. I wasn’t quite sure what to make of this at first. No way would Google forget to turn on HTTP compression, I must have missed something. I stared at the HTTP response for sometime, trying to figure out what I was missing. Nothing came to mind, so I ran another test.

This time I made a request for http://ajax.googleapis.com/ajax/libs/jquery/1.4.3/jquery.min.js in Firefox 3.6.12 on Mac OS X and used Firebug to inspect the HTTP transaction. The request:

GET /ajax/libs/jquery/1.4.3/jquery.min.js HTTP/1.1
Host: ajax.googleapis.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache

and the response:

HTTP/1.1 200 OK
Content-Type: text/javascript; charset=UTF-8
Last-Modified: Fri, 15 Oct 2010 18:25:24 GMT
Date: Fri, 29 Oct 2010 03:12:35 GMT
Expires: Sat, 29 Oct 2011 03:12:35 GMT
Vary: Accept-Encoding
X-Content-Type-Options: nosniff
Server: sffe
Content-Encoding: gzip
Cache-Control: public, max-age=31536000
Content-Length: 26769
Age: 147128

This time the content was compressed. There were several differences in the request headers between curl and Firefox, I decided to start with just one, the “User-Agent”. I modified my initial curl request to include the User-Agent string from Firefox:

curl -O -v --compressed --user-agent "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12" http://ajax.googleapis.com/ajax/libs/jquery/1.4.3/jquery.min.js

The request:

> GET /ajax/libs/jquery/1.4.3/jquery.min.js HTTP/1.1
> User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12
> Host: ajax.googleapis.com
> Accept: */*
> Accept-Encoding: deflate, gzip
> 

and the response:

< HTTP/1.1 200 OK
< Content-Type: text/javascript; charset=UTF-8
< Last-Modified: Fri, 15 Oct 2010 18:25:24 GMT
< Date: Fri, 29 Oct 2010 03:33:09 GMT
< Expires: Sat, 29 Oct 2011 03:33:09 GMT
< Vary: Accept-Encoding
< X-Content-Type-Options: nosniff
< Server: sffe
< Content-Encoding: gzip
< Cache-Control: public, max-age=31536000
< Content-Length: 26769
< Age: 147018
< 

Sure enough, I got back a compressed response. Google was sniffing the User-Agent string to determine if a compressed response should be sent. It didn’t matter if the client asked for a compressed response ( Accept-Encoding: deflate, gzip) or not. What still wasn’t clear is if this was a black list approach (singling out curl) or a white list approach (Firefox is okay). So I tried a few other requests with various User-Agent strings. First up, no User-Agent set at all:

curl -O -v --compressed --user-agent "" http://ajax.googleapis.com/ajax/libs/jquery/1.4.3/jquery.min.js

Not compressed. Next a made up string:

curl -O -v --compressed --user-agent "JosephScott/1.0 test/2.0" http://ajax.googleapis.com/ajax/libs/jquery/1.4.3/jquery.min.js

Not compressed. At this point I think Google is using a white list approach, if you aren’t on the list of approved User-Agent strings for getting a compressed response then you won’t get one, no matter how nicely you ask.

I collected a few more browser samples as well, just to be sure:

  • Safari 5.0.2 on Mac OS X – compressed
  • IE 8 on Windows XP – compressed
  • Firefox 3.6.12 on Windows XP – compressed
  • Chrome 7.0.517.41 beta on Windows XP – compressed
  • Opera 10.63 on Windows XP – NOT compressed
  • Safari 5.0.2 on Windows XP – compressed

One more time, curl using the IE 8 User-Agent string:

curl -O -v --compressed --user-agent "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET4.0C;" http://ajax.googleapis.com/ajax/libs/jquery/1.4.3/jquery.min.js

Compressed.

Since I can manipulate the response based on the User-Agent value I’m left to conclude that the Google Library CDN sniffs the User-Agent string to determine if it will respond with a compressed result. From what I’ve seen so far Google Library contains a white list of approved User-Agent patterns that it checks against to determine if it will honor the compression request.

If you are on a current version of one of the popular browsers you will get a compressed response. For those using anything else you’ll have to test to confirm if Google Library will honor your request for compressed content. Opera users are just plain out of luck, even the most recent version gets an uncompressed response.

The Loosely Coupled Web

There are so many terms floating around for the “the web”. One that has been overly abused is “web 2.0″. Someone mentioned “web 2.0″ during a conversation the other day and it reminded me that I had never written about my favorite alternative “web” term: “the loosely coupled web”. I’ve been tempted by “the open web”, but the term open has been smashed beyond recognition.

What do I mean by loosely coupled? It means I can start a new site that provides specific services, and people can easily build off it without my intervention. A great general example of this is the core of the web itself: HTTP. When you put up a new site browsers just work with it, right out of the box.

Another example is WordPress and the XML-RPC APIs. If you want to write a new blog client that works with WordPress you don’t need to create an account any where or sign up for anything extra. You make use of the common APIs that WordPress provides and go to town. RSS and Atom feeds fall into the same category.

I think it is also possible to be loosely coupled in a more specific scope. For example some of Facebook’s Graph API I’d consider loosely coupled. You need to know a little bit about it to understand how to make requests and how to read the responses, but that’s it. And I like the low barrier that loosely coupled implies. As you get deeper there are additional requirements, but at least to get started there is very little friction.

I’m no where near the first to use this term ( a quick Google search turns up plenty of hits ), but I don’t think it has received as much respect as it deserves.

XMLHttpRequest (XHR) Uses Multiple Packets for HTTP POST?

A recent Think Vitamin article, The Definitive Guide to GET vs POST, mentioned something that I hadn’t seen before about XMLHttpRequest (XHR). Their Rule #4 states:

When using XMLHttpRequest, browsers implement POST as a two-step process (sending the headers first and then the data). This means that GET requests are more responsive – something you need in AJAX environments.

The claim is that even the smallest XHR will be sent using two packets if the request is done over HTTP POST instead of HTTP GET. I don’t remember ever having heard this claim before.

Let me first say that performance issues for POST vs. GET probably shouldn’t be your top factor for deciding which one to use. Make sure that you understand the implications of each and pick the right method for your request. For most people I suspect the biggest factor will involve caching, not performance. I was going to leave a comment on the article about this, but Simon beat me to it.

I wasn’t the only one who wanted to find out more about XHR POST using multiple packets. Fortunately someone else already asked that question and the author replied:

2. My claim is based on research done by Iain Lamb, cofounder of the Oddpost webmail startup that was acquired by Yahoo! and eventually became the basis for the all-new Yahoo! Mail.

His research showed “rather baffling finding: POST requests, made via the XMLHTTP object, send header and body data in separate tcp/ip packets [and therefore,] xmlhttp GET performs better when sending small amounts of data than an xmlhttp POST.”

That is why Yahoo includes the use of GET instead of POST as one of their high performance speed optimisation rules.

Simon Willison did some looking around and found more links for this. It was mentioned here and here, so it looks like Iain Lamb did do this research, even though I couldn’t find a first person account of it. This was enough information to make me curious, but not enough to answer all of my questions. It was time to run some tests of my own.

So I updated my install of Wireshark on Windows XP, turned off all of the packet reassembly options for HTTP decoding and started testing browsers. My very simple XHR POST test page looked like this:

<button type="button" onclick="$.post('hello.txt', {name: 'Joseph'})">XHR POST</button>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js"></script>

When the button is clicked an XHR POST request is made to hello.txt with the name=Joseph for a tiny amount of data. The domain I tested on sent along some cookies as well, but still left enough room for the tiny POST payload to fit in a single TCP packet.

Here are the results of the tests that I ran:

  • IE 6 – 2 packets
  • IE 7 – 2 packets
  • IE 8 – 2 packets
  • Firefox 3.0.13 – 1 packet
  • Firefox 3.5.2 – 1 packet
  • Opera 9.27 – 2 packets
  • Safari 4.0.3 – 2 packets
  • Chrome 2.0.172.43 – 2 packets

The short version of this is pretty easy to see, all of the browsers except for Firefox will use at least 2 packets for an XHR done over HTTP POST. When I saw that Safari sent 2 packets I figured that Chrome would as well, but I tested it anyway just to make sure.

I looked at the data size of each packet in IE 6; the first packet had 575 bytes of data and the second packet had 11 bytes of data. This lined up with the POST request which indicated that the content length was 11 bytes. The second packet consisted only of the POST data. Because Firefox sent less data in the user-agent string I increased the POST data so that it would exceed the combined total of the two IE packets to make sure I wasn’t running into any odd packet fragmentation. The second packet in Opera, Safari and Chrome was also only the 11 bytes of POST data.

If this were Myth Busters I’d call this myth confirmed. While it is true that not ALL browsers will always use two packets, it appears that the two packet process is the rule, not the exception. And with IE still the most widely used browser it’s very likely that a large portion of your users fall into the two packet category. If on the other hand 95% of your users happen to be using Firefox, then sure, you can skip thinking about this.

Older posts Newer posts

© 2014 Joseph Scott

Theme by Anders NorenUp ↑