Apache FallbackResource Directive

As of Apache HTTPD 2.2.16 there is a new FallbackResource directive:

It is frequently desirable to have a single file or resource handle all requests to a particular directory, except those requests that correspond to an existing file or script. This is often referred to as a ‘front controller.’

In earlier versions of httpd, this effect typically required mod_rewrite, and the use of the -f and -d tests for file and directory existence. This now requires only one line of configuration.

FallbackResource /index.php

Existing files, such as images, css files, and so on, will be served normally.

This is a really useful directive to have available, reminds me of the try_files feature in Nginx.

mod_pagespeed for Apache

mod_pagespeed is an open-source Apache module that automatically optimizes web pages and resources on them. It does this by rewriting the resources using filters that implement web performance best practices. Webmasters and web developers can use mod_pagespeed to improve the performance of their web pages when serving content with the Apache HTTP Server.

via mod_pagespeed Overview.

HTTP Basic Authentication, A Tale of AtomPub, WordPress, PHP, Apache, CGI and SSL/TLS

I’ve been really enjoying working with Tim Bray, Pete Lacey, Elias Torres and Sam Ruby on improving AtomPub in WordPress. This work is in WordPress 2.3, which will be released later this month. You can try it out right now by downloading the beta. Sam has also started some documentation on AtomPub in WordPress at http://codex.wordpress.org/AtomPub.

There is a lot of ground to cover in the post so to start with I want to distinguish between two topics that are closely related, but for our purposes today are also separate and distinct from each other. The first is authentication, specifically HTTP Basic Authentication. The second is security, which will focus on SSL/TLS (i.e. using https:// URLs).

To start with, the AtomPub spec has a section on Securing the Atom Publishing Protocol that deals with authentication. In general, you can use nothing or what ever you want, but HTTP Basic Authentication with TLS needs to be able to work. Think of it as HTTP Basic Authentication being the lowest common denominator that AtomPub clients and servers have to support, along with TLS if you’d like.

In WordPress there are actually two ways that a user could be authenticated when using AtomPub, HTTP basic and cookies. The cookie mechanism just looks to see if you sent along an authenticated WordPress cookie with your request. Since we’d been using Tim’s Atom Protocol Exerciser (APE) for testing, all authentication was being done via HTTP basic. Which worked fine, most of the time.

I started running APE against WordPress running under different situations and I ran into a problem with authentication when PHP was being run as a CGI under Apache. When running as a server module (mod_php) PHP takes care of decoding HTTP basic for you (see HTTP basic authentication in PHP). When a using HTTP basic PHP will automatically populate $_SERVER[‘PHP_AUTH_USER’] and $_SERVER[‘PHP_AUTH_PW’] variables with the username and password that were provided. IF and ONLY IF PHP is being run as a server module (like mod_php). If you are running PHP as a CGI then those two variables won’t get created at all, ever, even when using HTTP basic authentication. And since you can’t do anything in WordPress via AtomPub without authenticating you are dead in the water. Well, not exactly.

PHP not supporting HTTP basic auth when being run as a CGI is a known issue, so folks have come up with clever work ways to work around this. One common work around is to use mod_rewrite to add HTTP basic auth into $_SERVER[‘HTTP_AUTHORIZATION’]:

RewriteEngine on
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

The idea here is that mod_rewrite watches for an HTTP basic auth attempt and then injects the HTTP header in to the PHP environment as HTTP_AUTHORIZATION. From there is it an easy job of parsing and decoding the HTTP header and manually populating $_SERVER[‘PHP_AUTH_USER’] and $_SERVER[‘PHP_AUTH_PW’] yourself. This is currently being done in the WordPress AtomPub code, so if you are on a host that runs PHP as a CGI and you have access to .htaccess and mod_rewrite then you can try it out.

Unfortunately I’ve seen times where this doesn’t work either. A modified version of this that I’ve had better success with is to pass the authentication back in via GET. Here’s an example from a test WordPress blog that redirects AtomPub authentication:

RewriteEngine on
RewriteBase /test/atompub/
RewriteCond %{HTTP:Authorization}  !^$
RewriteRule wp-app.php wp-app.php?HTTP_AUTHORIZATION=%{HTTP:Authorization} [QSA,L]

Instead of parsing and decoding from $_SERVER[‘HTTP_AUTHORIZATION’] you would do it from $_GET[‘HTTP_AUTHORIZATION’]. This isn’t exactly ideal either, but I’ve had better luck getting it to work in PHP as a CGI environments. Code to support this isn’t in WordPress AtomPub yet, but we might add it.

The four of us went back and forth on this a bit then Tim Bray asked the elephant in the room question: why doesn’t PHP support HTTP basic when running as a CGI? I didn’t have a good answer for him, so I went hunting on Google. It turns out that this has nothing to do with PHP, it is how Apache works. Apache does not pass the HTTP basic headers to CGI applications, so they never see them. This has been mentioned in several places, for brevity I’ll only quote one, from Jon Udell talking about CGI and mod_perl:

HTTP Authentication

“Note that such a module has complete access to the HTTP headers sent by the client. If you write a CGI script to enforce a security policy, à la the ByteCal example above, that script will normally see only the user’s name (HTTP_REMOTE_USER) and not the full credentials (HTTP_AUTHORIZATION).

That’s because Apache, as a security measure, withholds the Authorization header from CGI scripts. (If you really want to build a CGI-based access-control script, you can tweak Apache to make it send this header.) But an Apache/Perl authentication module, running inside the server, knows everything that Apache knows about a request.”

So far I’ve used WordPress and AtomPub as an example, but this problem is not specific to either. This is an issue with CGI applications being able to use HTTP basic authentication, and the ways people have worked around it. While there are ways to deal with this (like the two I mentioned above), they aren’t ideal and only work if you can use .htaccess and mod_rewrite.

There have been lots of alternatives to authentication that get around this issue. Lots of people have looked at this, hopefully we’ll have a generalized way of dealing with this at some point. Until then it looks like we’ll see API specific variations of authentication.

Ok, I also mentioned that we’d talk about security. This one is more to the point, if you aren’t using SSL/TLS then your communications aren’t secure. Although HTTP basic doesn’t send your plain text password and username, it is the next best thing (base64 encoded). So anyone with access to your traffic (wireless network sniffing anyone?) can easily grab your username and password. So how do you secure this authentication process? By doing it over SSL/TLS. If your web traffic isn’t using SSL/TLS it isn’t secure.

In the context of WordPress there is a trade off here. We can’t guarantee that every WordPress install is going to support SSL/TLS, so we can’t make it a requirement. That said, there is nothing in WordPress (or the APIs: AtomPub and XML-RPC) that prevent you being able to use SSL/TLS. This leaves it up the person running the WordPress blog to decide what level of security is needed.

On WordPress.com we support TLS/SSL. You can point your XML-RPC client at https://<your_blog_here>.wordpress.com/xmlrpc.php and it will encrypt the data back and forth between your computer and WordPress.com servers. Same for AtomPub, only the URL would look like https://<your_blog_here>.wordpress.com/wp-app.php.

Hopefully everyone takes away two things from this. One, you can’t depend on HTTP basic authentication working. Two, if you aren’t using SSL/TLS then your traffic isn’t secure.

Pleasant URLs in PHP

phpRiot.com posted a new article on creating search engine friendly URLs in PHP. I always like to see this topic get attention, there are too many sites on the web that generate very long and difficult to read URLs. Usually this involves transforming GET parameters into part of the URL. Here is an example of a URL using GET parameters:


Here is an example using a more pleasant URL:


The phpRiot article goes on to describe different methods for achieving this.

PHP, But Only When Needed

There are times when you only want to have your PHP scripts run when certain conditions have (or haven’t) been met. This technique is often used for caching. A real life example of this is discussed in Serving rendered images at the speed of light. The gist of the story is that a PHP script is used to generate thumbnails of images, but it only needs to be called when a thumbnail doesn’t already exist.

The described solution involved making use of Apache‘s mod_rewrite, so this isn’t a purely PHP solution. If you aren’t already familiar with mod_rewrite that is okay (I’ve used and it is still quite confusing at times), the author covers each step one at a time.

The BSD Licensed Application STack (BLAST)

There are a lot of open source licenses out there. For me open source license usually implies GPL, LGPL and BSD/MIT, although there are over 50 on the OSI‘s list. I tend to be partial towards the BSD/MIT licenses over (L)GPL, I’ve mentioned this before. For the purposes of this article thought, I’m going to focus on BSD licensed software. Don’t take this as a slam against the GPL, I just wanted to focus on the largest amount of flexibility and ease for developers.

It really is amazing that today you can develop an entire application or service using BSD licensed software up and down the stack. This particular license allows you to modify code and it is up to you if you want to share it. Although it is always recommended to participate in the community, this license doesn’t make it a requirement if you want to distribute modified code (this may be a good or a bad thing depending on your point of view). Thus all types of applications and services can be built upon BSD licensed code, from top to bottom. If you’ve ever tried to figure out which license you have to buy from a company that offers more types of licenses than cars on the road you know that not having to go through that maze can be a great time saver.

So here is what I’m thinking of when I talk about the BSD licensed application stack. At the lowest level we need an operating system. I prefer FreeBSD for this, but there is no reason why NetBSD, OpenBSD or DragonFly BSD couldn’t be used instead. Any of these will provide a complete operating system and depending on your wants or needs you may find one fits you better than the others.

Now that we have an OS (FreeBSD), we’ll need some place to store data that our application or service will be using. SQL databases have grown to fit this need quite well. Because we are focusing on BSD licensed open source software one database really stands above the rest, PostgreSQL. Not only is it a perfect fit for our criteria, it is a great piece database software. PostgreSQL supports many features that users of commercial databases have come to expect (Views, Functions, Schemas, etc).

The way to deploy applications and or services today is on the web. Here again we are fortunate because the most commonly used web server is open source and BSD licensed. The Apache web server is flexible (mod_rewrite anyone?) and powerful.

Finally we’ll need an a programming license to get things done. This one piece of the stack is probably the most difficult to pin down. My pick though would have to be PHP, whose license is close to the BSD license. It is also targeted at for web apps, but I’ve used it for command line applications as well.

The BSD Licensed Application STack (BLAST) is about software that does its job well and has a license that is easy to understand and gives you the ability to get distributed changes to yourself. Activity in the community is optional, but encouraged. For me this means FreeBSD, PostgreSQL, Apache and PHP.

Many of you reading this will be jumping up and down that this is just a rehash of L.A.M.P.. On one level this is true, Linux, Apache, MySQL and PHP/Perl (L.A.M.P.) do satisfy one part of BLAST, open source software that gets the job done. Unfortunately the licensing for some of these products is difficult to understand and in some cases the same license is interpreted in different ways (yes MySQL I’m looking at you).

The components of BLAST may change over time, (perhaps another language besides PHP?) but the intent and abilities will be the same. Good open source code with ability to do what you want with it.

Reverse Proxy With Apache

I’ve wanted to get all of our web servers at work under one umbrella in a reverse proxy setup for some time. I wanted this so that I could expose only one web server to the outside world (some of this has to do with network topology that is beyond my control), allowing me make all of our web services available under one URL (nice for things like SSL and multiple URL rewriting) and making it possible to filter web requests at one place. Today I finally sat down with the intent to make it work and plan for the switch over.

In the past I’d looked at doing this with Pound, but it fell short in one key area, URL rewriting. Everything else I needed was already there, in one convenient spot. I really wanted this to work, but in the end I couldn’t give up URL rewriting, it’s a requirement for what I’m trying to accomplish. I also looked at Squid for a time. Honestly I didn’t complete my trial of Squid, it is possible that it might me all of my requirements, but I didn’t see anything that looked like mod_security for Squid. That was another needed feature, I’m trying to protect IIS servers so I wanted all the extra help I could get.

So I eventually ended up at Apache, with mod_proxy. With the help of this how to article things went pretty smoothly. Until I tried to bring the server hosting our Squirrelmail install. No matter what I tried I could get to successfully login. Suspecting that this had to with cookies being based between the client, proxy and server I went hunting for web on the web. Turns out the how to article above mentions a couple of proxy directives for dealing with this, ProxyPassReverseCookieDomain and ProxyPassReverseCookiePath. Unfortunately these are only available in the development version of Apache, version 2.1. I’d spent the better part day of the tweaking my install of Apache 2.0.54 and then had to go build Apache 2.1.x to get the newer version mod_proxy. Some of the module names have changed so I couldn’t just drop in my previous Apache config. I also had to rebuild the mod_proxy_html and mod_security modules for Apache 2.1.x.

The good news is that once I had that all sorted out the new proxy cookie directives did the trick. So here is my little public service announcement (PSA), if you are using mod_proxy as a reverse proxy for Squirrelmail, start with Apache 2.1.x and look into ProxyPassReverseCookieDomain and ProxyPassReverseCookiePath. I suspect this will be the case for any webapp that uses cookies.

I haven’t added mod_security and SSL to the mix yet, but I’m already familiar with those modules, so I don’t expect that to be too bad. I’d never really used mod_proxy as a reverse proxy before so this was some what new territory to me. Oh, at some point I’ll look doing some caching in combination with reverse proxy to minimize the load on the back end web servers.

UPDATE 10:45am 6 Jul 2005:: I should have mentioned the trailing slash problem also shows up in the reverse proxy setup. So if you have a reverse mapping that looks something like:

ProxyPass /webmail/

then you can use mod_rewrite to send a redirect with the trailing slash:

RewriteRule ^/webmail$ webmail/ [R]