Tag: python (page 1 of 2)

Simple Filesystems with Python and FUSE

I’ve posted the slides and example code from my ‘Simple Filesystems with Python and FUSE’ presentation at the OpenWest conference.

A PDF of the slides and all of the example code is available at https://github.com/josephscott/python-fuse-2013-05.

Brython

Brython, Python in your browser:

Brython is designed to replace Javascript as the scripting language for the Web. As such, it is a Python 3 implementation, adapted to the HTML5 environment, that is to say with an interface to the DOM objects and events.

That is a trip. Practical? Probably not. Definitely neat though.

The brython.js code that does all the heavy lifting is only 175,936 bytes.

I am Speaking at the OpenWest Conference

The OpenWest Conference is happening May 2-4, 2013 (formerly the Utah Open Source Conference) at Utah Valley University in Orem, Utah.

This year the keynote speakers are Rasmus Lerdorf, creator of PHP, and Mark Callaghan, lead of the MySQL engineering team at Facebook.

For my part I’ll be giving three different presentations this time around. First up is “Simple Filesystems with Python and FUSE”, where I’ll cover the basics of getting a simple filesystem up and running written in Python using the FUSE library. Next up is “Site Testing with CasperJS”, which is an intro to using CasperJS to run user tests against your site. Last, but not least, is “Scaling WordPress”, where I’ll talk about some of the methods that WordPress.com (the largest WordPress install in the world) uses to host tens of millions of sites that add up to billions of page views per month.

I tried to keep my session titles direct and to the point. At times there will up to ten sessions running at once ( OpenWest session schedule ), so I wanted people to be able to tell at a glance what my sessions are about.

Tickets for OpenWest are available at $80. Every open source group in the area has been given a discount code though, so you can bring that down significantly.

If you’ll be at the OpenWest conference be sure to say hi.

wpcomfs – A WordPress.com Filesystem

Back in April WordPress.com announced a new REST style API. That got me thinking about writing a filesystem layer to expose that data, along similar lines to the pressfs code I’d written last year.

It is still rough, and only supports read-only public data, but in the spirit of release early (and often) I’m sharing the code for wpcomfs at https://github.com/josephscott/wpcomfs.

Assuming you have FUSE with Python bindings already working on your system you can download this and start trying it out in three easy steps:

  1. mkdir /tmp/wpcomfs
  2. python wpcomfs.py /tmp/wpcomfs/
  3. mkdir /tmp/wpcomfs/sites/en.blog.wordpress.com

After those steps you’ll have read-only data for en.blog.wordpress.com available at /tmp/wpcomfs/sites/en.blog.wordpress.com.

The mount point /tmp/wpcomfs and the site en.blog.wordpress.com are just examples. You mount wpcomfs where ever you’d like and you can expose public data for any public site hosted on WordPress.com. Since there are millions of sites hosted at WordPress.com wpcomfs will only load data for sites that you mkdir.

This also works for WordPress.com sites that are using mapped domain names. For instance mkdir /tmp/wpcomfs/sites/gigaom.com will provide you with a filesystem layer for GigaOM.com.

Data

When you mkdir a site you get site data, recent posts, and recent comments. In order to keep interactions with the filesystem responsive this data is only loaded once, when mkdir is run. Running rmdir /tmp/wpcomfs/sites/en.blog.wordpress.com will remove the site data.

The top level directory for a site looks like:

$ ls -la /tmp/wpcomfs/sites/en.blog.wordpress.com/
total 14
-r-------- 1 root root    7 Jun  6 14:09 ID.txt
-r-------- 1 root root   28 Jun  6 14:09 URL.txt
dr-------- 2 root root 4096 Jun  6 14:09 comments
-r-------- 1 root root   61 Jun  6 14:09 description.txt
dr-------- 2 root root 4096 Jun  6 14:09 meta
-r-------- 1 root root   18 Jun  6 14:09 name.txt
dr-------- 2 root root 4096 Jun  6 14:09 posts

The contents of each file are available in read-only mode:

$ more /tmp/wpcomfs/sites/en.blog.wordpress.com/URL.txt 

http://en.blog.wordpress.com

All of the dates exposed for files and directories in wpcomfs are based on when the site data was loaded. Basically when mkdir was run for the site.

You can mkdir as many sites as you want. I haven’t tested an upper limit, but I imagine if you load enough of them your system will run out of memory and die.

Bugs

I have noticed a few bugs. Specifically, in some cases post content runs into some encoding issues and doesn’t get exposed properly at the filesystem level. Another one that I’ve seen is that the comment count number doesn’t show up correctly. Both of these are issues in the Python code that I need to take the time to work out.

While not really a bug, there are also some patterns in this code that I’m not entirely happy with. I’m hoping with a bit more Python experience I’ll be able to simplify those.

Kick The Tires

The code is available at https://github.com/josephscott/wpcomfs, please give a try and let me know what you think.

GPlusFS: Google+ Data as a Filesystem

With the basics of a Google+ API announced last week I started poking around. I liked that there was an option to simply sign up for an API key, allowing me to quickly try it out from the command line using cURL. The JSON formatted results were easy enough to understand.

That led to hooking up FUSE and Python to provide the ability to mount the people.get data as a filesystem. A bit more hacking and gplusfs was born. It is still basic, but has been working fine so far with my profile. This is all still read-only, since the Google+ API is read-only.

Here is what a directory listing of my profile looks like:

> ls -la *
-r-------- 1 root root   57 1970-01-01 00:00 aboutMe.txt
-r-------- 1 root root   12 1970-01-01 00:00 displayName.txt
-r-------- 1 root root    4 1970-01-01 00:00 gender.txt
-r-------- 1 root root   21 1970-01-01 00:00 id.txt
-r-------- 1 root root 5117 1970-01-01 00:00 image.jpg
-r-------- 1 root root   11 1970-01-01 00:00 kind.txt
-r-------- 1 root root   45 1970-01-01 00:00 url.txt

organizations:
total 1
dr-------- 2 root root  0 1970-01-01 00:00 ./
dr-------- 2 root root  0 1970-01-01 00:00 ../
-r-------- 1 root root 10 1970-01-01 00:00 work.txt

urls:
total 1
dr-------- 2 root root  0 1970-01-01 00:00 ./
dr-------- 2 root root  0 1970-01-01 00:00 ../
-r-------- 1 root root 63 1970-01-01 00:00 json.txt
-r-------- 1 root root 45 1970-01-01 00:00 profile.txt

For JSON values that are strings I used the key name for the file, with an added .txt extension. The size of the file is determined by the length of the string. It works the way you’d expect:

> more displayName.txt 
Joseph Scott

For the profile image the API provides the URL of the image, gplusfs grabs a copy of the image and exposes that to the filesystem instead of the URL string.

The other special case is list values, which are exposed as directories. The JSON data provides multiple values for these, so breaking these out as separate directories with their own files seemed like the path of least surprise.

The source code is available at https://github.com/josephscott/gplusfs and configuration wise it just needs your Google+ user id and an API key. Give it a spin and let me know if you run into any issues with your profile data.

Under Appreciated Code: strtotime

There are some functions that are so useful, you sometimes wonder how anyone gets by without them. I give you the under appreciated code of the day : strtotime.

In the PHP world strtotime is a given. Here are a few examples:

$when = 'yesterday';
echo "{$when} : " . date( 'Y-m-d H:i:s', strtotime( $when ) ) . "n";

$when = 'first day of last month';
echo "{$when} : " . date( 'Y-m-d H:i:s', strtotime( $when ) ) . "n";

$when = '+45 days';
echo "{$when} : " . date( 'Y-m-d H:i:s', strtotime( $when ) ) . "n";

$when = '+1 year 3 months 4 days 6 hours 14 seconds';
echo "{$when} : " . date( 'Y-m-d H:i:s', strtotime( $when ) ) . "n";

$when = 'last monday of next month';
echo "{$when} : " . date( 'Y-m-d H:i:s', strtotime( $when ) ) . "n";

The ability of strtotime to slice and dice dates and intervals can be a life saver. I highly recommend taking 10 minutes to try it out.

I was really disappointed to see that strtotime was not part of the Python “batteries included” approach. Fortunately there are Python implementations of strtotime, but really it is so amazingly handy that it should be up for strong consideration as a core feature.

pressfs – Read Only Media Support

Last night I tagged version 0.3.0 of pressfs, which includes read only support for media files managed by WordPress. You access these files via the new top level directory: /media. Entries in /media look like:

-r-------- 1 joseph root   47487 2010-08-03 20:52 boat.jpg

Copying media files is as easy as cp /var/wp/media/boat.jpg /tmp/boat.jpg

To support read only media files I added two new methods to the pressfs WordPress plugin: get_media_list and get_media_file. The new get_media_file method is a bit different in that it does not return JSON, instead it returns the raw data of the file.

There are still some questions about the best way to structure this data for WordPress installs that have a large number of media files. For now I just wanted something functional that people could use.

The source is at https://github.com/josephscott/pressfs or you can download pressfs 0.3.0 as a zip file. A tar.gz of pressfs 0.3.0 is also available.

pressfs – Dipping a Toe into Write Support

When I first announced pressfs I knew that write support was going to come up as a requested feature. As I mentioned in that post, before I made the initial release I’d already been working with write code. After more testing and code clean up I’ve updated pressfs to version 0.2.0, which has (very) limited write support.

And by limited, I mean really, really, really limited.

There are exactly two things that you can edit using pressfs in version 0.2.0: post content and a the url value for a user account (under contact info). Example paths for these looks like:

/var/wp/users/LOGIN/url
/var/wp/posts/POSTID-POSTNAME/content

I knew that post content was something people wanted to be able to edit, and adding another field that wasn’t related to posts made me think about how to properly abstract the code that determines which files are writable.

While I’ve tested this repeatedly against my dev install of WordPress, I can’t stress enough that you need to be careful. Read only is pretty safe, with no real way to mess up your WordPress install. Now that we are venturing into the write waters the code needs more people to test it before I’d consider it safe.

Now, with that out of the way, go give this a try – the pressfs code is available on github. Creating a new mount point is easy enough, and I recommend using a non-root account to do it. If your uid is 3000 it is as simple as:

python pressfs.py /var/wp/ -o uid=3000

And if you do find a problem, use the -d option to have pressfs run in the foreground, it will display filesystem activity and python errors.

HTTP Basic Auth with httplib2

While working on pressfs I ran into an issue with httplib2 using HTTP Basic Authentication.

Here is some example code:

import httplib2

if __name__ == '__main__' :
    httplib2.debuglevel = 1

    h = httplib2.Http()
    h.add_credentials( 'username', 'password' )

    resp, content = h.request( 'http://www.google.com/', 'GET' )

If you run this you’ll notice that httplib2 doesn’t actually include the HTTP Basic Auth details in the request, even though the code specifically asks it to do so. By design it will always make one request with no authentication details and then check to see if it gets an HTTP 401 Unauthorized response back. If and only if it gets a 401 response back will it then make a second request that includes the authentication data.

Bottom line, I didn’t want to make two HTTP requests when only one was needed (huge performance hit). There is no option to force the authentication header to be sent on the first request, so you have to do it manually:

import base64
import httplib2

if __name__ == '__main__' :
    httplib2.debuglevel = 1

    h = httplib2.Http()
    auth = base64.encodestring( 'username' + ':' + 'password' )

    resp, content = h.request(
        'http://www.google.com/',
        'GET',
        headers = { 'Authorization' : 'Basic ' + auth }
    )

Watching the output of this you’ll see the authentication header in the request.

Someone else already opened an issue about this ( Issue 130 ), unfortunately Joe Gregorio has indicated that he has no intention of ever fixing this :-(

On the up side, working around this deficiency only takes a little bit of extra code.

pressfs – A WordPress Filesystem

Here is something else I’ve been toying with: pressfs, a WordPress filesystem. Currently it exposes user, post, tag, and category data in a read-only filesystem.

The Short Version

This is a Python script that uses FUSE to expose data from a WordPress site as a filesystem.

For the impatient you can give this a try in just a few steps:

  • get the pressfs source code from Github
  • install and activate the pressfs WordPress plugin
  • copy example-config.ini config.ini
  • edit config.ini, set values in WordPress section
  • python pressfs.py /your/mount/point/

This code is still roughly beta quality, try it out on a test/dev WordPress install first. Authentication is just HTTP basic, so please use it over SSL. It also needs to use an administrator level WordPress account (something that might change in the future).

History

A little over two years ago I started thinking about writing a WordPress plugin to act as a WebDAV server so that we could expose media uploads as an easily mounted filesystem. I poked around a bit, but quickly put the idea on the shelf and worked on other things. Fast forward to early 2011, after I got raifbot into a functional state I started taking a more serious look at WordPress + WebDAV. As a concept this is definitely plausible, but after trying several different approaches I decided I wasn’t interested in dealing with the edge cases I ran into (nginx not having built in support for chunked encoding was particularly irksome).

Although I wasn’t happy with the WebDAV issues, I still kept thinking about ways we could expose WordPress resources as a filesystem. A few weeks later I started experimenting with the idea of using FUSE to expose WordPress data as a filesystem. It was a rough start, but I finally got it to a point where it was time to share with others and get more feedback.

Python

I don’t have much experience writing Python code, which isn’t surprising since my full time job has involved WordPress and WordPress related development (lots of PHP code). Sure, I’ve tweaked a few Python scripts from time to time, read Dive Into Python and toyed with some of the examples, but I’ve never sat down to write new code for a new project in Python until now. I welcome feedback on improving the code, just remember this is my first time sailing this ship, so be gentle :-)

The requirements for running pressfs aren’t terribly exotic. Obvious you need to have the FUSE library installed and the fuse-python bindings. Other external libraries it uses include httplib2 and simplejson. I think all of the other imported code are standard for most python installs.

There are only a handful of examples and tutorials on getting started with fuse-python, which I had to experiment with repeatedly (along with tons of debugging) to figure out how the pieces fit together. I’ll likely be posting tutorials and how-to’s about python-fuse, both to help others and to make sure I have a clear idea of how the various features work.

The WordPress Plugin

I choose to write a new WordPress plugin to expose the data I needed because I wanted to be able to tweak it on a whim. Using JSON to exchange data was a nice bonus as well. So for now at least you’ll need to have this active on any site that you want to use pressfs with.

Directory Layout

If you have pressfs mounted on /var/wordpress you’ll see something like this:

  • /var/wordress/categories/
  • /var/wordpress/posts/
  • /var/wordpress/tags/
  • /var/wordpress/users/

The categories directory will list all of the categories, using the category slug. Each individual category is a directory with the following files in it: count, description, id, name, parent, and slug. These all contain what you would expect.

The posts directory lists the posts on your site, with the format of <POSTID>-<NAME> (or if NAME isn’t available <POSTID>-<TITLE>). Each post directory has the following files in it: content, date-gmt, id, name, password, status, title, type, and url. Shouldn’t be any surprises about the contents of each of those files.

The tags directory lists the tags on your site, using the slug. Each tag directory contains: count, description, id, name, and slug.

The users directory lists all the users on your site, with directory entries using the username field. Each user directory contains these files: display-name, email, id, login, nice-name, registered, and url.

Want to see the description field for category photos:
more /var/wordress/categories/photos/description

Want to see the post body for the post id 123, which has the name summer-vacation:
more /var/wordpress/posts/123-summer-vacation/content

Want the email field from user josephscott:
more /var/wordpress/users/josephscott/email

The Future

The largest limitation that I expect people to ask about is moving from read-only to allowing write operations. The good news is that I already have experimental code that allows for writes of specific fields. I have more testing and cleaning up of the code to do, but I intended to add write support in a future version.

Next up is exposing more WordPress data. I already have test code for exposing media upload files, so that will get added eventually. Beyond that I’d like to expose other less obvious items as well, like settings.

While I’ve revised this a few times before getting to this point, I’m not sure I’ve got everything just right. For instance, is the current directory layout really the best way to go? Don’t be surprised if there are significant changes as development goes along.

Conclusion

Now that I’ve gotten my feet wet with WordPress + python-fuse I’m excited about doing more with this. This was also a good way to start learning more Python, for something that resembles a real application, which I’ve wanted to do for some time now.

In the short term I’d like to hear how this works (or doesn’t) for others. If you run into a problem re-mount pressfs with python pressfs.py /your/mount/point/ -d, which will keep the process running in the foreground, with details on what each operation does (including errors). That should provide enough details to figure out where the problem is.

Older posts

© 2014 Joseph Scott

Theme by Anders NorenUp ↑