I am Speaking at the OpenWest Conference

The OpenWest Conference is happening May 2-4, 2013 (formerly the Utah Open Source Conference) at Utah Valley University in Orem, Utah.

This year the keynote speakers are Rasmus Lerdorf, creator of PHP, and Mark Callaghan, lead of the MySQL engineering team at Facebook.

For my part I’ll be giving three different presentations this time around. First up is “Simple Filesystems with Python and FUSE”, where I’ll cover the basics of getting a simple filesystem up and running written in Python using the FUSE library. Next up is “Site Testing with CasperJS”, which is an intro to using CasperJS to run user tests against your site. Last, but not least, is “Scaling WordPress”, where I’ll talk about some of the methods that WordPress.com (the largest WordPress install in the world) uses to host tens of millions of sites that add up to billions of page views per month.

I tried to keep my session titles direct and to the point. At times there will up to ten sessions running at once ( OpenWest session schedule ), so I wanted people to be able to tell at a glance what my sessions are about.

Tickets for OpenWest are available at $80. Every open source group in the area has been given a discount code though, so you can bring that down significantly.

If you’ll be at the OpenWest conference be sure to say hi.

wpcomfs – A WordPress.com Filesystem

Back in April WordPress.com announced a new REST style API. That got me thinking about writing a filesystem layer to expose that data, along similar lines to the pressfs code I’d written last year.

It is still rough, and only supports read-only public data, but in the spirit of release early (and often) I’m sharing the code for wpcomfs at https://github.com/josephscott/wpcomfs.

Assuming you have FUSE with Python bindings already working on your system you can download this and start trying it out in three easy steps:

  1. mkdir /tmp/wpcomfs
  2. python wpcomfs.py /tmp/wpcomfs/
  3. mkdir /tmp/wpcomfs/sites/en.blog.wordpress.com

After those steps you’ll have read-only data for en.blog.wordpress.com available at /tmp/wpcomfs/sites/en.blog.wordpress.com.

The mount point /tmp/wpcomfs and the site en.blog.wordpress.com are just examples. You mount wpcomfs where ever you’d like and you can expose public data for any public site hosted on WordPress.com. Since there are millions of sites hosted at WordPress.com wpcomfs will only load data for sites that you mkdir.

This also works for WordPress.com sites that are using mapped domain names. For instance mkdir /tmp/wpcomfs/sites/gigaom.com will provide you with a filesystem layer for GigaOM.com.

Data

When you mkdir a site you get site data, recent posts, and recent comments. In order to keep interactions with the filesystem responsive this data is only loaded once, when mkdir is run. Running rmdir /tmp/wpcomfs/sites/en.blog.wordpress.com will remove the site data.

The top level directory for a site looks like:

$ ls -la /tmp/wpcomfs/sites/en.blog.wordpress.com/
total 14
-r-------- 1 root root    7 Jun  6 14:09 ID.txt
-r-------- 1 root root   28 Jun  6 14:09 URL.txt
dr-------- 2 root root 4096 Jun  6 14:09 comments
-r-------- 1 root root   61 Jun  6 14:09 description.txt
dr-------- 2 root root 4096 Jun  6 14:09 meta
-r-------- 1 root root   18 Jun  6 14:09 name.txt
dr-------- 2 root root 4096 Jun  6 14:09 posts

The contents of each file are available in read-only mode:

$ more /tmp/wpcomfs/sites/en.blog.wordpress.com/URL.txt 
http://en.blog.wordpress.com

All of the dates exposed for files and directories in wpcomfs are based on when the site data was loaded. Basically when mkdir was run for the site.

You can mkdir as many sites as you want. I haven’t tested an upper limit, but I imagine if you load enough of them your system will run out of memory and die.

Bugs

I have noticed a few bugs. Specifically, in some cases post content runs into some encoding issues and doesn’t get exposed properly at the filesystem level. Another one that I’ve seen is that the comment count number doesn’t show up correctly. Both of these are issues in the Python code that I need to take the time to work out.

While not really a bug, there are also some patterns in this code that I’m not entirely happy with. I’m hoping with a bit more Python experience I’ll be able to simplify those.

Kick The Tires

The code is available at https://github.com/josephscott/wpcomfs, please give a try and let me know what you think.