Tag: performance (page 2 of 3)

On HTTP Load Testing – The Hello Word Test

I came across On HTTP Load Testing via Simon Willison this morning. It makes some good points, but I want to pick on just one: 7. Do More than Hello World :

Finding out how quickly your implementation can serve a 4-byte response body is an interested but extremely limited look at how it performs. What happens when the response body is 4k — or 100k — is often much more interesting, and more representative of how it’ll handle real-life load.

Another thing to look at is how it handles load with a large number — say, 10,000 — of outstanding idle persistent connections (opened with a separate tool). A decent, modern server shouldn’t be bothered by this, but it causes issues more often than you’d think.

I both disagree and agree with this. The part I disagree with is that testing your implementation against a 4-byte response body is not helpful. I contend that it is. If you know that you need to get X from the new server that you are testing, then the first thing I’d test is the maximum performance, which means doing the least amount of work. For a web server that may mean serving a static file that only contains ‘Hello World!’ (13 bytes).

If I can’t get a web server to reach the performance level of X using the static hello world file, then there is no way it is magically going to reach it after adding on several layers of additional work. That is why measuring the peak possible performance is important, you immediately determine if your need of X is even possible.

If your test results are over X, great, then start adding on more/larger work loads, as suggested in the post. If your tests are under X then you need to consider some server level changes. That might mean hardware changes, operating system and software tuning, or all of the above.

I had originally intended to leave this as a comment on On HTTP Load Testing, but it requires me to create an account on the site, which I have no interest in doing.

Closing PHP Tag – More Work

I was testing PHP code with VLD and noticed something odd. If I left off the final closing PHP then there were fewer ops. Here is an example:

$first_name = 'Joseph';
$last_name = 'Scott';

has 6 ops:

line     # *  op                  operands
   2     0  >   EXT_STMT                                                 
         1      ASSIGN           !0, 'Joseph'
   3     2      EXT_STMT                                                 
         3      ASSIGN           !1, 'Scott'
   5     4      EXT_STMT                                                 
         5    > RETURN              1

The same file, minus the closing PHP tag:

$first_name = 'Joseph';
$last_name = 'Scott';

has only 5 ops:

line     # *  op                operands
   2     0  >   EXT_STMT                                                 
         1      ASSIGN        !0, 'Joseph'
   3     2      EXT_STMT                                                 
         3      ASSIGN        !1, 'Scott'
   4     4    > RETURN         1

I trimmed the VLD output to make it easier to read.

Boils down to an extra EXT_STMT op when the closing PHP tag is included.

Performance Analysis: lds.org

lds.org waterfall chart

lds.org watefall chart

Development of a new version of lds.org has been available on new.lds.org for some time. As of 30 November 2010 that new version is now live on lds.org. I’d taken a few quick looks at the performance of the new version previously, but with it now live this seemed like a good time to take a closer look.

First up, the webpagetest.org results for lds.org. Some numbers for reference:

  • Data transfered: 1,287 KB
  • Number of HTTP requests: 66
  • Fully loaded time: 2.3 seconds (taken with a grain of salt, this is just a single test)

The score for HTTP compression is an F, but that doesn’t really tell the whole story. Most of the resources on the page are loaded from lds.org and cdn.lds.org. Only cdn.lds.org (Apache web server) is responding with compressed results, lds.org (MarkLogic web server) never provides a compressed response. This results in an extra 118 KB sent over the wire. Given the initial large size of the data transfer for the site this is not a huge dent, but still big enough to make a difference. With HTTP compression enabled the total data count drops to 1,169 KB.

Something that would make a bigger dent is image optimization. The image compression analysis indicates another 478 KB in data savings that could be achieved. Combined with HTTP compression support this brings the page down to 691 KB. Addressing these two issues brings us down to 54% of the original page weight. No doubt this would reduce the time it takes to fully load the page. (Exactly how much you can reasonably compress a JPEG may vary, so take these numbers as a ball part estimate)

There are three external CSS files, minifying those would save another 17 KB, bringing our running total down to 674 KB. If you were to combine them as well you could cut out 2 HTTP requests entirely, which would help reduce the page load time.

Minifying the HTML output would cut out another 7 KB. This gets us down to 667 KB.

I thought this was going to be it for data reduction, but then I took a closer look at the Javascript files, some of them could still be reduced (which I tested with http://closure-compiler.appspot.com/home):

  • https://cdn.lds.org/ml/platform/scripts/platform-utilities.js – original 8 KB, compiled 6 KB
  • https://lds.org/resources/script/common/s_code_ldsall.js – original 12 KB, compiled 11 KB
  • https://lds.org/resources/script/common/content.js – original 1.6 KB, compiled 1.2 KB
  • https://lds.org/resources/script/pages/home.js – original 502 bytes, compiled 367 bytes

All of those numbers are the compressed versions of the files. They add up to an additional 3.5 KB of savings, which makes our new total 663.5 (664) KB.

All of those items together brings down the data transfer by nearly 50%. There are other things besides reducing the total data numbers that we can do to improve the performance of the site.

There are 46 parallelizable (is that a word?) requests to lds.org. Browsers have been increasing the number of concurrent connections they’ll support for a single host, but 46 seems well beyond that. Splitting that up across two or three host names would allow browsers to download more of the resources in parallel. Same is true for cdn.lds.org, but on a smaller scale for this particular page, which loads 12 items from there. Breaking that up across two or three host names would help.

I found the use of cdn.lds.org interesting. This is a good first step, common files used across various church sites can load resources from there and hopefully see increased cache hits. The ‘Expires’ header puts some of the resources out as far as 20 years. Others are as small as 4 hours, which should be increased. Another common reason for using a CDN is the lack of cookies needed for the request. Unfortunately using the same domain (lds.org and cdn.lds.org) means that requests to cdn.lds.org include the lds.org cookie.

I looked at this in more detail after I logged into the site, the cookie data added an extra 421 bytes to each request sent to cdn.lds.org, that goes completely unused. That comes out to some where around 5 KB of data sent to cdn.lds.org servers that is completely useless. To add insult to injury, the cookie data is also sent when the browser checks to see if the file has been updated ( If-Modified-Since and If-None-Match ). The solution would be to move the CDN to a separate domain name that is only used by the CDN and nothing else. You still get the distributed site caching and additional parallel download benefits and the browser will never have additional (useless) cookie data to send for each request.

While on the topic of caching, many of the resources loaded from lds.org have a very short Expires setting, just one hour. These could be cached for much longer, saving on bandwidth for subsequent page requests.

I already mentioned combining CSS files, the same could be done for some of the Javascript files as well. Reducing the total number of HTTP requests cuts out the over head involved with each request. Another option for reducing the number of requests is to combine some of the images into CSS sprites.

There are other things the site could do to improver performance (like moving more of the Javascript to the bottom of the page), but this list is a good start, with several options on how to improve the performance of the new lds.org site. Some are more involved than others, but in the end substantial reductions can be had in the amount of data transfered, the total number of resource requests on the page and the fully loaded time.

How To Kill IE Performance

Remember: IE (still) doesn’t have getElementsByClassName, so in IE, jQuery has to iterate the whole DOM and check whether each elements class attribute contains "lineitem". Considering that IE’s DOM isn’t really fast to start with, this is a HUGE no-no.

via How to kill IE performance « gnegg.

Doing something as simple as:


with jQuery in Javascript can be brutal in IE if it matches a large number of nodes in the DOM. The reason, as noted in the quote above, is because IE doesn’t have a native getElementsByClassName() Javascript function. As a result you have to walk the DOM looking for nodes with that particular class. Expect this to be slow. If the DOM for your page is large expect this to be painfully slow.

The good news is that modern versions of other other major browsers out there do implement getElementsByClassName() and that the IE9 previews also support it. Unfortunately this means killing off IE6 isn’t enough, we’ve got to find a way to convince people who stick with IE to move all the way up to IE9 (once it is finally released). Not an easy task since IE9 requires Windows Vista or Windows 7. If you are still on Windows XP (still the most popular OS for those browsing the web), IE9 is not an option for you.

What do we do in the meantime? First, be careful when using a class to select nodes. Second, convince people to download Chrome, Firefox, or Safari and stop using IE (unless they are already using the IE9 betas).

UPDATE: Want a better idea of how much faster a native implementation of getElementsByClassName() can be? Check out John Resig’s speed comparison from March 2007.

Performance Analysis: beta-newsroom.lds.org

A new version of newsroom.lds.org.org has recently been launched at http://beta-newsroom.lds.org/. This seemed like a good opportunity to give it a once over Steve Souders style, looking at the performance of the new site.

The first thing I did was run it through webpagetest.org; with the Dulles, VA – IE 8 – FIOS settings. You can see the results at http://www.webpagetest.org/result/100924_5TGG/. A quick glance at the category scores (top right) shows that there is likely plenty of room for performance improvements.


The first thing that stood out was the lack of compression for resources. Starting with the very first request, the HTML for http://beta-newsroom.lds.org/ there is no compressed version made available. A simple way to confirm this is with curl:

curl -v --compressed http://beta-newsroom.lds.org/ > /dev/null

Which makes an HTTP request that looks like:

> GET / HTTP/1.1
> User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
> Host: beta-newsroom.lds.org
> Accept: */*
> Accept-Encoding: deflate, gzip

and gets back a response of:

< HTTP/1.1 200 OK
< Content-type: text/html; charset=utf-8
< Server: MarkLogic
< Content-Length: 33232
< Proxy-Connection: Keep-Alive
< Connection: Keep-Alive
< Age: 9064
< Date: Fri, 24 Sep 2010 05:07:27 GMT

The raw HTML is 33,232 bytes. The compressed version of the HTML is 8,549 bytes. An easy way to trim the page size by 24,683 bytes. I'd be surprised if their web server (MarkLogic) doesn't support compression, so this is likely a simple configuration change some where.

Going done the list of easily compressed resources:

  • http://beta-newsroom.lds.org/assets/styles/screen.css - 105,478 bytes, 16,689 compressed, saves 88,789 bytes
  • http://beta-newsroom.lds.org/assets/scripts/jquery-1.4.min.js - 69,838 bytes, 23,666 compressed, saves 46,172 bytes
  • http://beta-newsroom.lds.org/assets/scripts/common.js - 19,574 bytes, 5,080 compressed, saves 14,494 bytes
  • http://beta-newsroom.lds.org/assets/scripts/jquery.cycle.all.min.js - 23,729 bytes, 7,122 compressed, saves 16,607 bytes
  • http://beta-newsroom.lds.org/assets/videos/uvp/scripts/swfobject.js - 21,543 bytes, 4,728 compressed, saves 16,815 bytes
  • https://secure.lds.org/stats/s_code_ldsall.js - 31,543 bytes, 12,415 compressed, saves 19,128 bytes

On a fully loaded page that weighs over 600KB turning on compression support for these 7 items would reduce it by 226,688 bytes.

The https://secure.lds.org/stats/s_code_ldsall.js file is servered by a different web server, Netscape-Enterprise/4.1, which is fairly old. I'm not sure if it even properly supports compression. If not throwing a caching server (nginx, varnish, squid, etc.) in front would do the trick.

Another method for reducing the file sizes is minifying the CSS and Javascript (in some cases this is being done already).


Loading all of that Javascript at the top of the page is causing other downloads to block. As a result the site doesn't start to render until we are more than a second into the page. This is a good place to remember the rule of thumb: load CSS as early as possible and Javascript as late as possible.

There is one block of inline Javascript in the page, but it is at the very bottom.

For the most part there doesn't appear to be too many places where parallel downloads are blocked. One thing that could be done to improve parallel downloads though is to spread out more of the resources across different host names.


The page loads 51 images, totaling 248KB. The Page Speed results indicates that these could be optimized to reduce their size by 104KB.

Serving images (and other static content) from another domain would also cut down on cookie data sent back for each request.


Here is another area that I was really surprised by, the bulk of those images don't provide cache headers. As a result browsers will re-download those images for each page load through out the site. The repeat view waterfall chart should be much smaller, no need to fetch all those images on every page view.

Throwing in some Last-Modified and ETag headers will clear that up.


There are some other techniques that could be employed to help speed things up, but they depend heavily on how the site is developed and deployed. There is already enough low hanging fruit to make a big difference.

I think you could conservatively target the total size for the page at less than 300KB. This would reduce the amount of data transmitted to browsers by more than 50%. Another benefit is the time it takes for the page to fully load. Currently it is right at 2.5 seconds. Something closer to 1.5 seconds seems reasonable. With proper caching headers return visitors could see times under 1 second.

All of this was just for the front page of the site. I haven't looked at any other pages on the site, but I suspect that they'd benefit from the same items listed above.

PHP Garbage Collection

Derick Rethans on garbage collection in PHP: Part 1 – Variables, Part 2 – Cleaning Up, Part 3 – Performance. PHP 5.3 has significant improvements in this area.

XHR / AJAX Performance – GET or POST?

During the summer of 2009 I posted about XMLHttpRequest (XHR) using multiple packets for HTTP POST, but HTTP GET requests only used one (in most cases). This led to several people recommending HTTP GET requests for AJAX when possible, to maximize performance.

Fortunately someone (Load Impact) took actual measurements to see what this looked like in the real world – AJAX GET or POST – which is best?. For details check out their analysis report (PDF) (warning, contains gory geek details). The short version, what they observed is that not only was HTTP POST (multiple packets) not slower, it was faster. This is definitely contrary to the basic mental model of how this should work.

If you are into front end performance and or TCP/IP go check it out, it would be great get solid explanation of why they are seeing these results. On the flip side, if there is a flaw in the testing it would be good to identify that and come up with a new test.

Slides: Anatomy of a PHP Request

Here are the slides from my presentation last week at UPHPU -

I added it to my slides page.

Efficient PHP: Don’t Abuse dirname( __FILE__ )

Every now and then I run across a chunk of PHP code at the top of a file that looks something like this:

require dirname( __FILE__ ) . '/path/to/something.php';
require dirname( __FILE__ ) . '/path/to/another.php';
require dirname( __FILE__ ) . '/path/to/me-too.php';
require dirname( __FILE__ ) . '/path/to/sure-why-not.php';
require dirname( __FILE__ ) . '/path/to/kitchen-sink.php';

and what jumps out at me is the repeated use of dirname( __FILE__ ) for each require statement (for now we’ll avoid asking why anyone would need to include the kitchen-sink in their code base). My gut instinct is to call dirname( __FILE__ ) once, save that in a variable and then reference the variable to build the path. Not wanting to go on instinct alone I put together a small test to see if it really would make any difference.

The contrived test code will compare approach A:

$var = dirname( __FILE__ ) . '/path/to/something.php';
$var = dirname( __FILE__ ) . '/path/to/another.php';
$var = dirname( __FILE__ ) . '/path/to/me-too.php';
$var = dirname( __FILE__ ) . '/path/to/sure-why-not.php';
$var = dirname( __FILE__ ) . '/path/to/kitchen-sink.php';

with approach B:

$path = dirname( __FILE__ );

$var = $path . '/path/to/something.php';
$var = $path . '/path/to/another.php';
$var = $path . '/path/to/me-too.php';
$var = $path . '/path/to/sure-why-not.php';
$var = $path . '/path/to/kitchen-sink.php';

I’m not testing with require in an effort to focus just on the difference repeated dirname( __FILE__ ) calls make, not how fast the filesystem can slurp in PHP libraries.


My first test was to pass each approach through VLD to see how much “work” PHP was doing. For that I pulled out the number of operations needed for each approach:

approach A: 37 ops
approach B: 23 ops

Calling dirname( __FILE__ ) once required 37% fewer operations. This is a bit of blunt measurement since it doesn’t attempt to give individual weights to the different operations, but it gives a good general view. The rule of thumb is that fewer ops is better than more ops.


I tried running each approach in a simple loop, but it always ran so quickly that I didn’t see any useful difference.


Next up was a look at memory_get_peak_usage. Turns out there was a small difference:

approach A: 57,312 bytes
approach B: 56,992 bytes

Sure 320 bytes isn’t a big deal in the world of servers with 16GB of memory, but it’s one more reason why approach B is just that tiny bit better.


If you see this pattern creeping into your code base and you can easily convert it then you’ll likely be better off for it. Remember that everything is a trade off though, if it takes you several hours to go through and make this kind of change it may not be worth it.

I’d certainly file this away for new projects though so that you can avoid repetitive dirname( __FILE__ ) calls from the start. For that matter, if you can get away with running on 5.3 or higher then you’d probably want to skip this entirely and look at using the new __DIR__ constant. I haven’t tested how it compares to the two approaches listed above, but I’d expect it to be at least as good as approach B, perhaps even better.

For reference I tested this using PHP 5.2.10-2ubuntu6.4 with Suhosin-Patch 0.9.7 (cli) (built: Jan 6 2010 22:41:56). Your specific version of PHP may behave differently.

Thinking Clearly About Performance

Cary Millsap has a great paper out – Thinking Clearly About Performance. It has reminded me how critical details can be in measuring performance, throughput, response time, efficiency, skew and load.

The beauty of this paper isn’t that it has some magical new technique, quite the opposite actually. The methods talked about are solid, fundamental approaches that can be used to methodically figure out what’s going in your system.

It’s only 13 pages long and is a pleasant, brisk read. I recommend it for anyone doing software development, database administration, or system administration.

Older posts Newer posts

© 2014 Joseph Scott

Theme by Anders NorenUp ↑