Joseph Scott

Archive for the ‘php’ tag

MakeItLink - Detecting URLs In Text And Making Them Links

with 7 comments

In late October Jeff Atwood wrote about The Problems With URLs, describing the problems of parsing out URLs in text and transforming them into links. Here’s a simple example:

My website is at http://josephscott.org/

Would be changed into:

My website is at <a href=’http://josephscott.org/’>http://josephscott.org/</a>

Sounds simple right? Once you start looking at what the valid character set is for URLs things get tricky. I won’t rehash all of items, go the The Problem With URLs post to see an example of some of the problems.

I knew that WordPress had a make_clickable function (in wp-includes/formatting.php) that did this exact thing. After testing this against some of the problems that Jeff points out it became clear that make_clickable() didn’t handle these edge cases. I made some rather crude tweaks to the WordPress code to fix some of these and opened ticket 8300 with my patches. Then filosofo came along and not only cleaned up my hacks, but reduced the amount of code needed in general. Major kudos to filosofo!

At this point it looks like we’ve got code to make make_clickable() work correctly with problem URLs. I’m going to wait until after WordPress 2.7 is released to push for getting this code committed since we’re trying to get 2.7 wrapped up.

I got thinking, this bit of code would be really handy to have as a stand alone library. So I pulled out the various pieces of code needed to make this work and put it together in a single PHP class: MakeItLink

class MakeItLink {
    protected function _link_www( $matches ) {
        $url = $matches[2];
        $url = MakeItLink::cleanURL( $url );
        if( empty( $url ) ) {
            return $matches[0];
        }

        return "{$matches[1]}<a href='{$url}'>{$url}</a>";
    }

    public function cleanURL( $url ) {
        if( $url == '' ) {
            return $url;
        }

        $url = preg_replace( "|[^a-z0-9-~+_.?#=!&;,/:%@$*'()x80-xff]|i", '', $url );
        $url = str_replace( array( "%0d", "%0a" ), '', $url );
        $url = str_replace( ";//", "://", $url );

        /* If the URL doesn't appear to contain a scheme, we
         * presume it needs http:// appended (unless a relative
         * link starting with / or a php file).
         */
        if(
            strpos( $url, ":" ) === false
            && substr( $url, 0, 1 ) != "/"
            && !preg_match( "|^[a-z0-9-]+?.php|i", $url )
        ) {
            $url = "http://{$url}";
        }

        // Replace ampersans and single quotes
        $url = preg_replace( "|&([^#])(?![a-z]{2,8};)|", "&#038;$1", $url );
        $url = str_replace( "'", "&#039;", $url );

        return $url;
    }

    public function transform( $text ) {
        $text = " {$text}";

        $text = preg_replace_callback(
            '#(?<=[\s>])(\()?([\w]+?://(?:[\w\\x80-\\xff\#$%&~/\-=?@\[\](+]|[.,;:](?![\s<])|(?(1)\)(?![\s<])|\)))*)#is',
            array( 'MakeItLink', '_link_www' ),
            $text
        );

        $text = preg_replace( '#(<a( [^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i', "$1$3</a>", $text );
        $text = trim( $text );

        return $text;
    }
}

It’s very easy to use, just load up the text you want to search for link and call the transform method:

$text = MakeItLink::transform( $text );

All of this code came out of WordPress, which is licensed under the GPL, so consider the MakeItLink code GPL as well. If you’ve got some improvements let me know and make sure that it gets back into the original WordPress functions as well.

Written by Joseph Scott

November 28th, 2008 at 9:00 am

Tagged with , ,

PHP URL Routing (PUR)

leave a comment

I’ve been thinking about individual features of various code frameworks, starting with two features that are closely related: clean URLs and URL routing. To examine this idea further I started writing a basic implementation of these two features in PHP.

To start with we’ll redirect all requests to a single index.php file. Here’s the .htaccess file:

RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)\?*$ index.php?_route_=$1 [L,QSA]

The idea here is pretty basic, unless the exact file or directory exists redirect the request to index.php. When the redirect happens, add a GET variable (_route_) that contains the directory portion of the URL.

The index.php file itself is pretty simple:

require "./PUR.php";

$routes = array(
    "_not_found_"           => "demo_not_found",
    ""                      => array( "DEMO", "homePage" ),
    "color/black"           => array( "DEMO", "colorBlack" ),
    "color"                 => array( "DEMO", "color" )
);

$route = new PUR( );
$route->setRoutes( $routes );
$route->routeURL( preg_replace( "|/$|", "", $_GET['_route_'] ) );

First we include the PUR class (PHP URL Routing) and provide it with an array of URLs to function or class/methods and the URL that is currently being called. A URL can be mapped to either a function or a method of a class. In the above example there’s a special route called _not_found_ that is called when there is no route defined for a URL, in this case it will be passed to the demo_not_found function. Everything else goes through the DEMO class.

Another thing to note, because of the way the URL patterns are tested, the more specific URLs must appear higher up. That’s why color/black shows up before color. If there was a color/black/blue then it would have to be listed about color/black. The home page is a little bit of a special case, it’s the empty URL value.

It doesn’t matter where the code for the functions or classes are, it’s up to you to make sure they are pulled in before the routing takes place. I could have used a directory layout pattern like Rails and other, but I chose not to in this case. To keep things simple these can all be in the index.php file.

function demo_not_found( $args = false ) {
    print "Route not found.";
}

class DEMO {
    function homePage( $args = false ) {
        print "This is the home page.";
    }

    function colorBlack( $args = false ) {
        print "The color black and everything below.";
    }

    function color( $args = false ) {
        print "All the other colors.";
    }
}

Each function should accept a single optional argument. PUR will pass the the additional URL directories as an array to the function. Using our example, if you requested example.com/color/blue/and/green/ it would match the color URL and would call the color method from the DEMO class and $args would be an array:

Array
(
    [0] => blue
    [1] => and
    [2] => green
)

Lets get into the interesting part, the PUR class:

class PUR {
    protected $route_match      = false;
    protected $route_call       = false;
    protected $route_call_args  = false;

    protected $routes           = array( );

    public function __construct( ) {

    } // function __construct( )

    public function setRoutes( $routes ) {
        $this->routes = $routes;
    } // funciton setRoutes

    public function routeURL( $url = false ) {
        // Look for exact matches
        if( isset( $this->routes[$url] ) ) {
            $this->route_match = $url;
            $this->route_call = $this->routes[$url];

            $this->callRoute( );
            return true;
        }

        // See if the first part of the route exists
        foreach( $this->routes as $path => $call ) {
            if( empty( $path ) ) {
                continue;
            }

            preg_match( "|{$path}/(.*)$|i", $url, $match );
            if( !empty( $match[1] ) ) {
                $this->route_match = $path;
                $this->route_call = $call;
                $this->route_call_args = explode( "/", $match[1] );

                $this->callRoute( );
                return true;
            } // if
        } // foreach

        // If no match was found, call the default route if there is one
        if( $this->route_call === false ) {
            if( !empty( $this->routes['_not_found_'] ) ) {
                $this->route_call = $this->routes['_not_found_'];
                $this->callRoute( );
                return true;
            }
        }

    } // function routeURL( )

    private function callRoute( ) {
        $call = $this->route_call;

        if( is_array( $call ) ) {
            $call_obj = new $call[0]( );
            $call_obj->$call[1]( $this->route_call_args );
        }
        else {
            $call( $this->route_call_args );
        }
    } // function callRoute

} // class PUR

There are a few private variables that are used to track routes, URL and the function to call. The routeURL method does most of the work, so lets walk through each section. First we look to see if there’s an exact match in the routes array. In our example this would be “”, “color/black” and “color”. An exact match is always preferred and is easy to check for. If that doesn’t find anything then we move on to regular expression checking to see if the beginning of the URL matches any of the routes. This is what allows color/blue/and/green to match the color route. Finally, if a match still can’t be found then we look for the special _not_found_ route and use it.

The callRoute method is only used internally to actually issue the routing call. If the defined route is an array then it’s assumed to be a class/method pair and will create an object of that class and then call the method with the array of variables (if there are any). If it’s not an array then it’s assumed to be a function.

Getting this code up and working wasn’t too bad, and seems to cover the clean URL and URL routing needs pretty well with out requiring a ton of extra work. It has no external dependencies, so it could be used as a new drop in feature for existing projects.

Any thoughts on improving this code while keeping things simple?

Written by Joseph Scott

November 18th, 2008 at 7:00 am

Tagged with ,