A New, New Focus: VaultPress

Two years ago I posted about “A New Focus“, where my time at Automattic adjusted to be centered on Akismet. Back then Akismet was catching 500,000,000 spam comments per month (see the sidebar chart at http://akismet.com/about/). Today it is catching just over 2,000,000,000 spam comments per month (with a peak of nearly 2,500,000,000 at the end of 2011). Even with all that growth Akismet has continued to perform well, maintaining a high level of accuracy and performance, something that has been great to be a small part of.

This summer I’ve been asked to shift my focus again, by joining the VaultPress team.

If you aren’t familiar with VaultPress here is the elevator pitch: “VaultPress syncs the data from your WordPress site (posts, pages, comments, plugin & theme files, and media uploads) as they are added. On top of that VaultPress will scan your files for code vulnerabilities and changes to core WordPress files.” (more details are on the Get to know VaultPress page)

VaultPress also provides a restore process. If your WordPress site gets vaporized for some reason doing a fresh install and activating the VaultPress plugin will allow VaultPress.com to push a backup snapshot back to your server. There is also an option to manually download a backup snapshot, if you just want to pull out something specific.

The last two years focused on Akismet have been great, and now it is exciting to be taking on the new challenge of helping VaultPress improve and grow.

Scott Berkun’s Next Book Subject: Automattic & WordPress.com

Scott Berkun has announced the topic for his next book:

The next book is based on the journal I kept while working at WordPress.com. It tells the story of what I learned working for one of the most amazing companies in the world.

Scott worked at Automattic for 2 years as one of the team leads working on WordPress.com.

This is the first time I’ve seen someone write a book about where I work. Automattic really is a special place and I look forward to reading Scott’s treatment of it.

WordPress Core Contributor Handbook

If you’ve ever asked “How can I contribute to WordPress?” then go check out the WordPress Core Contributor Handbook. While still early in development (you’ll see that some sections are just a stub list of topics that need to be covered) it is already worth reading if you are interested in how WordPress development works.

A call been put out for someone to lead the work on the handbook.

WordCamp SLC 2012 Coming September 22nd

WordCamp SLC is back for 2012! The date is September 22nd, and we’ll be back at the University of Utah campus (their wifi has worked every year!).

Now is also the time to put together your speaker submissions. We’ve got several people already lined up, but still have room for more.

Tickets will go on sale this week. There will be a discount for purchasing your ticket early, as that makes it easier to plan for things like lunch and t-shirts.

For all the details subscribe to http://2012.slc.wordcamp.org/ and follow @wcslc on Twitter.

wpcomfs – A WordPress.com Filesystem

Back in April WordPress.com announced a new REST style API. That got me thinking about writing a filesystem layer to expose that data, along similar lines to the pressfs code I’d written last year.

It is still rough, and only supports read-only public data, but in the spirit of release early (and often) I’m sharing the code for wpcomfs at https://github.com/josephscott/wpcomfs.

Assuming you have FUSE with Python bindings already working on your system you can download this and start trying it out in three easy steps:

  1. mkdir /tmp/wpcomfs
  2. python wpcomfs.py /tmp/wpcomfs/
  3. mkdir /tmp/wpcomfs/sites/en.blog.wordpress.com

After those steps you’ll have read-only data for en.blog.wordpress.com available at /tmp/wpcomfs/sites/en.blog.wordpress.com.

The mount point /tmp/wpcomfs and the site en.blog.wordpress.com are just examples. You mount wpcomfs where ever you’d like and you can expose public data for any public site hosted on WordPress.com. Since there are millions of sites hosted at WordPress.com wpcomfs will only load data for sites that you mkdir.

This also works for WordPress.com sites that are using mapped domain names. For instance mkdir /tmp/wpcomfs/sites/gigaom.com will provide you with a filesystem layer for GigaOM.com.

Data

When you mkdir a site you get site data, recent posts, and recent comments. In order to keep interactions with the filesystem responsive this data is only loaded once, when mkdir is run. Running rmdir /tmp/wpcomfs/sites/en.blog.wordpress.com will remove the site data.

The top level directory for a site looks like:

$ ls -la /tmp/wpcomfs/sites/en.blog.wordpress.com/
total 14
-r-------- 1 root root    7 Jun  6 14:09 ID.txt
-r-------- 1 root root   28 Jun  6 14:09 URL.txt
dr-------- 2 root root 4096 Jun  6 14:09 comments
-r-------- 1 root root   61 Jun  6 14:09 description.txt
dr-------- 2 root root 4096 Jun  6 14:09 meta
-r-------- 1 root root   18 Jun  6 14:09 name.txt
dr-------- 2 root root 4096 Jun  6 14:09 posts

The contents of each file are available in read-only mode:

$ more /tmp/wpcomfs/sites/en.blog.wordpress.com/URL.txt 

http://en.blog.wordpress.com

All of the dates exposed for files and directories in wpcomfs are based on when the site data was loaded. Basically when mkdir was run for the site.

You can mkdir as many sites as you want. I haven’t tested an upper limit, but I imagine if you load enough of them your system will run out of memory and die.

Bugs

I have noticed a few bugs. Specifically, in some cases post content runs into some encoding issues and doesn’t get exposed properly at the filesystem level. Another one that I’ve seen is that the comment count number doesn’t show up correctly. Both of these are issues in the Python code that I need to take the time to work out.

While not really a bug, there are also some patterns in this code that I’m not entirely happy with. I’m hoping with a bit more Python experience I’ll be able to simplify those.

Kick The Tires

The code is available at https://github.com/josephscott/wpcomfs, please give a try and let me know what you think.

Updated libxml2-fix Plugin for WordPress

I made a small update to the libxml2-fix WordPress plugin this morning. There are few more odd combinations of PHP + libxml2 that have problems, so this change applies the work around as long as the libxml2 version is below 2.7.3. That saves on having to list all of the individual versions that may have this issue. This improvement was submitted by Danilo Ercoli.

I’m bummed that there are still so many hosts running known to be broken combinations of PHP + libxml2. In the meantime we may as well make it easier for people to work around the issue.