Tag: http (page 1 of 3)

HTTP/2 Expectations

Mark Nottingham on Nine Things to Expect from HTTP/2.

If you have any interest in the future of HTTP then mnot’s blog is well worth reading.

Listen for SSL and SSH on the Same Port

Many corporate firewalls will limit outgoing connections to ports 80 and 443 in a vain effort to restrict access to non-web services. You could run SSH on port 80 or 443 on a VPS or dedicated server, but if you have one of those you are probably already using it to host a small web site. Wouldn’t it be nice if your server could listen for both SSH and HTTP/S on port 80 and 443? That is where sslh comes in:

sslh accepts connections on specified ports, and forwards them further based on tests performed on the first data packet sent by the remote client.

Probes for HTTP, SSL, SSH, OpenVPN, tinc, XMPP are implemented, and any other protocol that can be tested using a regular expression, can be recognised. A typical use case is to allow serving several services on port 443 (e.g. to connect to ssh from inside a corporate firewall, which almost never block port 443) while still serving HTTPS on that port.

Hence sslh acts as a protocol demultiplexer, or a switchboard. Its name comes from its original function to serve SSH and HTTPS on the same port.

Source code is available at https://github.com/yrutschle/sslh.

For small uses cases this may come in handy. If you were constantly needing to SSH to port 80 or 443 then I’d recommend just spending a few dollars a month to get a VPS dedicated to that task.

If you are stuck in a limited corporate network another tool you may find useful is corkscrew, which tunnels SSH connections through HTTP proxies.

Fewer HTTP Verbs

Brett Slatkin suggests that we reduce the number of verbs in HTTP 2.0:

Practically speaking there are only two HTTP verbs: read and write, GET and POST. The semantics of the others (put, head, options, delete, trace, connect) are most commonly expressed in headers, URL parameters, and request bodies, not request methods. The unused verbs are a clear product of bike-shedding, an activity that specification writers love.

Interestingly, HTTP 1.0 only defined GET, POST, and HEAD back in 1996.

I could get behind the idea of just having GET, POST, and HEAD. In practice these tend to be the safest verbs to use. It would also put an end to having to talk about the semantics of PUT every six months.

Those that insist that all things must be REST or they are useless won’t like this. They could find a way to get over that.

TCP Over HTTP, A.K.A. HTTP 2.0

Skimming through the HTTP 2.0 draft RFC that was posted yesterday I’m left with the distinct feeling of implementing TCP on top of HTTP:

HTTP 2.0 Framing

HTTP 2.0 Framing

I’m in the camp that believes that future versions of HTTP should continue to be a text based protocol ( with compression support ).

Most weeks I look at several raw HTTP requests and responses. Yes, there will still be tools like cURL ( which I love ) to dig into HTTP transactions, so it isn’t the end of the world. Still, I am sad to see something that is currently fairly easy to follow turn into something significantly more complex.

reddit.com HTTP Response Headers

I found an old note to myself to look at the HTTP response headers for reddit.com. So I did this:

$ curl -v -s http://www.reddit.com/ > /dev/null
* About to connect() to www.reddit.com port 80 (#0)
* Trying 69.22.154.10…
* connected
* Connected to www.reddit.com (69.22.154.10) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8r zlib/1.2.5
> Host: www.reddit.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=UTF-8
< Server: ‘; DROP TABLE servertypes; –
< Vary: accept-encoding
< Date: Wed, 22 May 2013 14:37:25 GMT
< Transfer-Encoding: chunked
< Connection: keep-alive
< Connection: Transfer-Encoding
<
{ [data not shown]
* Connection #0 to host www.reddit.com left intact
* Closing connection #0

Fun Server entry in there. Reminded me of little Bobby tables from xkcd.

I’m sure this has made the rounds in other places. Unfortunately my note didn’t indicate where I first saw this.

iOS6 Safari Caching POST Responses

With the release of iOS6 mobile Safari started caching POST responses. Mark Nottingham talks through the related RFCs to see how this lines up with the HTTP specs. Worth a read for the details, here is the conclusion:

even without the benefit of this context, they’re still clearly violating the spec; the original permission to cache in 2616 was contingent upon there being explicit freshness information (basically, Expires or Cache-Control: max-age).

So, it’s a bug. Unfortunately, it’s one that will make people trust caches even less, which is bad for the Web. Hopefully, they’ll do a quick fix before developers feel they need to work around this for the next five years.

Over the years I’ve run across a handful of services and applications that claim to be able to cache HTTP POST responses. In every case that turned out to be a bad decision.

Google’s plusone.js Doesn’t Support HTTP Compression

I was surprised to see that Google’s plusone.js doesn’t support HTTP compression. Here is a quick test with
curl -v --compressed https://apis.google.com/js/plusone.js > /dev/null

Request Headers:

> GET /js/plusone.js HTTP/1.1
> User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8r zlib/1.2.3
> Host: apis.google.com
> Accept: */*
> Accept-Encoding: deflate, gzip

Response Headers:

HTTP/1.1 200 OK
< Content-Type: text/javascript; charset=utf-8
< Expires: Fri, 18 Nov 2011 02:35:20 GMT
< Date: Fri, 18 Nov 2011 02:35:20 GMT
< Cache-Control: private, max-age=3600
< X-Content-Type-Options: nosniff
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< Server: GSE
< Transfer-Encoding: chunked

You'll notice there is no Content-Encoding: gzip header in the response.

We'll have to get Steve Souders to pester them about that.

Timing Details With cURL

Jon’s recent Find the Time to First Byte Using Curl post reminded me about the additional timing details that cURL can provide.

cURL supports formatted output for the details of the request ( see the cURL manpage for details, under “-w, –write-out <format>” ). For our purposes we’ll focus just on the timing details that are provided.

Step one: create a new file, curl-format.txt, and paste in:

n
            time_namelookup:  %{time_namelookup}n
               time_connect:  %{time_connect}n
            time_appconnect:  %{time_appconnect}n
           time_pretransfer:  %{time_pretransfer}n
              time_redirect:  %{time_redirect}n
         time_starttransfer:  %{time_starttransfer}n
                            ----------n
                 time_total:  %{time_total}n
n

Step two, make a request:

curl -w "@curl-format.txt" -o /dev/null -s http://wordpress.com/

What this does:

  • -w "@curl-format.txt" tells cURL to use our format file
  • -o /dev/null redirects the output of the request to /dev/null
  • -s tells cURL not to show a progress meter
  • http://wordpress.com/ is the URL we are requesting

And here is what you get back:

            time_namelookup:  0.001
               time_connect:  0.037
            time_appconnect:  0.000
           time_pretransfer:  0.037
              time_redirect:  0.000
         time_starttransfer:  0.092
                            ----------
                 time_total:  0.164

Jon was looking specifically at time to first byte, which is the time_starttransfer line. The other timing details include DNS lookup, TCP connect, pre-transfer negotiations, redirects (in this case there were none), and of course the total time.

The format file for this output provides a reasonable level of flexibility, for instance you could make it CSV formatted for easy parsing. You might want to do that if you were running this as a cron job to track timing details of a specific URL.

For details on the other information that cURL can provide using -w check out the cURL manpage.

ETag Survey

In the last few weeks I’ve had conversations with a couple of different people about their sites not using ETags correctly. This led me to wonder how many of the top sites on the web have a similar problem.

I downloaded the list of top U.S. sites from Quantcast and wrote a simple PHP script to see which of them included an ETag header in their HTTP response. I ran checks on the top 1,000 sites from that list. Of those 136 included an ETag header in the response. Here are some of the interesting points from those 136:

- 9 of them indicated they were weak validators ( W/ )
- 2 had values of “”
- 1 had a completely empty value
- most used double quotes around the entire value, 6 didn’t use quotes at all
- 1 used a date value of “Sun, 17 Jul 2011 17:14:09 -0400″

Each response was checked for an ETag header, if it had one then another request was sent with the If-None-Match header, using the value of the ETag. For sites that are using ETags correctly they will detect this and send back a “304 Not Modified” status. Ultimately however I settled on four possible results for sites using ETags:

- WORKS ( ETAG_WORKS ) : does exactly what it should, returning “304 Not Modified” when appropriate
- WORKS, sort of, web server farm with different ETag values ( ETAG_WORKS_FARM ) : only does the right thing if you happen to hit the same backend web server repeatedly, which you can’t really control
- FAILS, the ETag value changes ( ETAG_FAILS_CHANGE ) : this is a failure where the site returns a different ETag value on every request, making it impossible to ever get a match
- FAILS, ignored If-None-Match ( ETAG_FAILS_IGNORE ) : the site consistently returns the same ETag value, but always forces a re-download of the resource even when a correct If-None-Match value is provided

The server farm situation is an interesting one. To test for that each time an ETag check request fails for a site I send another dozen requests to see if any of those succeed. That isn’t a perfect solution, all of the requests come from the same IP in a short period of time, so it is reasonable that some sites will send all of those requests to the same back end server in their farm. That said, this technique did get a few hits and was very easy to implement.

Here are the numbers for each of the possible categories, remember this is out of a total of 136:

- ETAG_WORKS : 54 ( 39.7% )
- ETAG_WORKS_FARM : 11 ( 8% )
- ETAG_FAILS_CHANGE : 24 ( 17.6% )
- ETAG_FAILS_IGNORE : 47 ( 34.5% )

Not exactly stellar results. More than half of the sites using ETags completely fail at using them correctly. To make matters even worse, the first site to use ETags correctly was ranked number #62 on the Quantcast list. There were 8 other sites ranked higher than that ( #5, #17, #35, #38, #48, #49, #51, and #55 ) that all failed. The good news in all of this: there is plenty of room for improvement.

The code (which is very basic) for running the survey is available at https://github.com/josephscott/etag-survey. That also contains the Quantcast list I used (downloaded 6 Sep 2011) and the results of the run (also dated 6 Sep 2011).

I need to look at the code for httparchive.org and see if this is something that could be easily added to their test suite. I’m hoping that the number of sites correctly using ETags will go up over time.

Performance Trends For Top Sites On The Web

Steve Souders posted an update on the HTTP performance trends for top sites, based on data gathered via http://httparchive.org/. Here are the bottom line numbers:

Here’s a recap of the performance indicators from Nov 15 2010 to Aug 15 2011 for the top ~13K websites:

  • total transfer size grew from 640 kB to 735 kB
  • requests per page increased from 69 to 76
  • sites with redirects went up from 58% to 64%
  • sites with errors is up from 14% to 25%
  • the use of Google Libraries API increased from 10% to 14%
  • Flash usage dropped from 47% to 45%
  • resources that are cached grew from 39% to 42%

I was surprised by the total transfer size increase. If you followed that trend on a weekly basis, every Friday for the last 9 months you added another 2.6 kB to the total transfer size of your site. Not much for any given week, but it adds up fast.

Older posts

© 2014 Joseph Scott

Theme by Anders NorenUp ↑