Why Hasn’t Anyone Figured Out How To Do Feed Searches?

For searching the web in general most people go to Google or perhaps Yahoo. For the last couple of weeks I’ve been interested in just searching through feeds (RSS and ATOM) for information. Say I wanted to track what people are saying about PostgreSQL. This can’t really be done with the traditional search engines (Google, Yahoo, etc) because they base their results on popularity (in one form or another). This doesn’t help me because I’m interested in what people are saying right now, not who has said the most popular thing. So I started using the feed search sites to see how they stacked up. The results were extremely disappointing.

Technorati
These guys have probably been around as long as anybody in the feed search area. They not only allow simple searches, but if you put in a URL they’ll give you a list of all the recent links to that URL. This feature is handy, but is extremely limited because once a link to a URL is no longer “current” then it drops off the list. There doesn’t appear to be any way to get a list of all the pages that have ever linked to a given URL.

The regular term search provides a similar results page from feeds that are considered “current”. Once again there doesn’t appear to be a way to get results further back than “current”. After using this for awhile to find out what people are saying about PostgreSQL I found that their database appeared to be updated in a rather jittery way. There were some occasions where I was able to find an entry about PostgreSQL via other sources that were more recent than all of the “current” results from Technorati. I’m not talking about a close couple of hours, but between one and two days. Technorati’s search also has a problem that is common among feed search sites, lots of entry duplicates. When almost 50% of the search results are duplicates the search becomes almost useless.

Technorati supports being pinged for feed updates, which is supported by Ping-o-Matic. There seems to be problems here also, in some cases it has taken days for some of my entries to show up in search results. In rare occasions some of entries never made it into search results. There are several points were this failure could have happened, but the result was the same, feed entries that should have been in search results but weren’t.

They have some additional for-pay features and also show advertising on their search results page. This looks like their only two forms of revenue, hopefully it is enough.

Bloglines
Although Bloglines primary service is as a feed aggregator they also have the ability to “Search All Blogs”. The Bloglines search has more options than Technorati, allowing options like: all of these words, exact phrase, at least one of these words, without these words, sort by popularity or date. The search can be limited to all blogs, only your subscriptions and excluding your subscriptions. The search looks up not just entries that have search terms in them but blog titles and descriptions that contains those terms.

Because Bloglines is already keeping all of the feed entries around for their aggregator accounts their search is limited to Technorati’s idea of “current” pages. They don’t have Technorati’s ability to lookup entries that link to a given URL in their search they do keep track of references per entry in their aggregator. This is a nice trade off for the aggregator, but it makes their search a little lacking, especially if you use this feature in Technorati a lot.

Perhaps it is because their primary focus in feed aggregating, they seem to be more up to date than Technorati. Unfortunately their search results are chock full of duplicates. I suspect this also stems from their aggregator focus. Bloglines doesn’t appear to support being pinged when a feed is updated. At the very least they aren’t listed on Ping-o-Matic.

Bloglines doesn’t display any advertisements, but they do have some for pay services. Just like Technorati I have to wonder if this will be enough to keep them going.

Feedster
In a superficial way Feedster has a similar style to their pages as Google. The search feature looks to be pretty basic, although it does support some additional filtering: limit to an RSS URL, limit to OPML URL and exclude RSS URL. Sorting of the search results can be done by relevance (popularity) or by date. There is another feature that is supposed to take a URL and find all of the feeds that it provides. This search works, but the links it provides to “All Posts” and “All Links” don’t appear to work.

Their search results page also has a similar style to Google. Unfortunately their usability is pretty poor. No matter how often I set the option to search by date the results pages indicates that it still searching by relevance. Another strange thing is that as you click on next to go through the search results, the number of results on each page seems to vary. Sometimes there will be 10 results listed then other times there will only be 4 results shown on a page. Not a huge problem, but it makes the site feel a bit funky.

The search results seem to be about as fresh Bloglines, but the number of duplicates doesn’t appear to be as high. This makes their results probably the best out of the three, but without the flexibility of Bloglines and the link search of Technorati. Add in the odd usability feel and you end up with something that is probably the best out of the three for results, moderate for power and poor the feel of the site.

I don’t know if they offer any for pay services, but they do show advertising in a similar style to Google. Hopefully following a model that has already been successful will be work well for them to generate revenue.

Update 10:40 pm 24 Aug 2004: Scott Johnson of Feedster left a comment pointing the Feedster Help Section. It looks like there are a lot more powerful search term features in there that didn’t jump out at me. I still like to see the duplicates reduced. I’ve tried to stick to talking about features, but I still think Feedster just feels funny. Considering that my atheistic design skills are pretty poor you may want to take that with a grain of salt and try it out for yourself.

Waypath (Added: 12:15 pm 24 Aug 2004)
This one was pointed out to me by Mark. I’d come across this just briefly in the past but didn’t play with it much. Now that I’ve started writing up some my thoughts I think I can look Waypath as it compares to the other three. The superficial look makes me think that if Feedster is using the Google “style” then Waypath is trying to go the Yahoo “style” with their new Topic Streams feature. This reminds me a lot of Yahoo’s origins as a categorized set of links. This feature is still a beta so I’d expect it to change with time.

Waypath looks like the only feed search site to understand the basic set of search term possibilities via their advanced search features (things like AND, OR, wildcards, single terms and phrases). Bloglines has this to some extent, but Waypath looks more complete. They also support finding entries that relate and link to a specific entry. This is kind of combination of the Bloglines reference system and the Technorati URL link search. You can also filter out or limit searches at the weblog level. I’d like to see them have a search syntax for this, not just just icons once you get a set of search results. Those icons need to be more unique also, it would be easy to mistake one for the other.

One thing that I should have included in my other reviews were use of other “interesting” features, one specifically, bookmarklets. Suffice it to say that if your feed search site isn’t making use of this then it should be. Waypath a couple of nice bookmarklets and I believe Bloglines also some. Another feature that is probably more gee-whiz than anything else is their Buzz Maker. Give it a couple of terms and it graphs them using entries for the last 45 days. They were even smart enough to provide HTMl you can cut and paste to use in your site to use these graphs. Waypath also makes some plugins for different blogging systems. If I get the time I’d also like to try out their XML-RPC services.

After playing with all of these little toys I get the sense that these guys might “get it” more than the other three, at least in terms of searching feeds. Unfortunately all is not perfect. Their search results are severely lacking, there just aren’t enough of them. This is probably because they aren’t indexing as many feeds as the other sites. I also didn’t see a way to ping them for updates (and they aren’t listed at Ping-o-Matic even if they do).

I didn’t see anything that indicated there were for pay services, but there are some ads along the side of the search results. Who knows if this is enough to bring enough revenue though. Overall these guys have some cool features, but if they don’t start indexing a lot more feeds all of those features won’t be very useful.

PubSub (Added: 10:00 pm 24 Aug 2004)
Another comment pointed out PubSub as another possibility. Their approach is different from others on this list. Instead of searching through existing feed entries you create a watch list that is used to scan feed entries as they come in. For certain applications this is great, like my example of keeping up with what people are saying about PostgreSQL. This narrow focus gives them certain advantages, but heavily limits their audience. I suspect that everyone else on this list should look at what PubSub does and integrate that feature as one component of what they provide.

They support pings and are listed on Ping-o-Matic. I couldn’t find any information of for pay services. I haven’t seen advertising yet, although I just signed up for a watch list feed for “PostgreSQL” so perhaps they advertise in the watch list RSS feed? I’m assuming they have some sort revenue model other wise they may not be very long for this network. Hmmm, combined their narrow focus and possible minimum revenue and they may be the most likely to be acquired.

Blogdigger (Added: 10:25 pm 24 Aug 2004)
One more feed search site suggested in the comments. I’d never heard of these guys so I’m only just getting a feel for their site. Their search appears to be ok compared to the others with one big difference, the number of duplicates seems to greatly reduced. They probably need to be indexing more sites to fill out their search results a little better, but what they do have seems to be well taken care of. They use meta feeds in several places like feeds for a search and link search (which didn’t seem to do much when I tried it). There are two beta features that look promising, Blogdigger Groups and Blogdigger Media.

The groups feature allows to create a blog made up of entries from other blogs. This is great subject matter blogs. The media feature provides feeds to track the latest .torrent, .wav, .mp3, .mov and .avi links that are in feed entries. I like the idea of these very dynamic meta feeds, it has the potential to make tracking interesting tidbits that much easier.

Blogdigger is using Google’s Adsense to advertise on their site. I didn’t see any other for pay services listed.

Conclusion
All of the big three feed search sites fall short of their potential. One problem that they all have in common is duplicate search results. As the number of entries increases being able to deal with duplicates is going to become a bigger and bigger problem. Someone needs to solve this, preferably sooner rather than later. When the number of duplicates becomes too high it makes the search results almost useless. Most the search features are pretty simple on these sites. While having more advanced features will undoubtedly require more intelligent and powerful systems I think the site that can integrate these the best will get a huge leap over the others as long as the other problems (like duplicates) don’t over power them. Look at Google’s Advanced Search page and think about what sort of features feeds make possible.

For me personally the biggest feature disappoint was the lack of cool advanced features involving dates. Virtually every feed entry has a date associated with it, this makes searching by date a possibility. No one seems to be doing this yet though, probably because of the additional power that would be required to do this. Maybe no one has every looked at the Google Groups Advanced Search page, where you can limit your search to certain time frame. We should be able to do this feed entries.

Another looming question is why hasn’t Google or Yahoo come up with a way to integrate feed searches into their web searches in a meaningful way? Maybe they are just waiting to let smaller companies research this and then buy one of them. I guess I’m just disappointed that people who already know so much about searching on the web haven’t applied that knowledge to this current problem.

Update 3:03 pm 24 Aug 2004: Fixed the spelling of Technorati (see Dave’s comment).

19 Comments

  1. Waypath (http://www.waypath.com) takes care of some of the things you are talking about.

  2. Great post, and a wonderful analysis of the various sites, including Technorati (btw, it is currently misspelled in the title). We’re working really hard on enhancing our core offerings including adding enhanced keyword search capabilities. In fact, if you’re interested, perhaps you could help us evaluate some of the new features before they go prime-time. Send me an email at dsifry at technorati dot com if you’re interested…

    Dave

  3. Thanks for pointing out the the spelling mistake, I’ve fixed it now.

  4. PubSub? You can set up searches by URL, by keywords specifically in the title of the post, etc. etc.

    Of course, the one downside is that they “search the future” — matches get delivered as they occur, not for any past matches.

  5. Hi,

    I’m one of the authors of Feedster. I haven’t had time to fully digest your comments but our search features are more robust than you give us credit for:

    http://feedster.com/help/

    Goes over all our search features.

    I’m going to run your comments by some of our other engineers here and see what we can do to address them. Thank you.

    Scott

  6. Based on suggestions Boris Mann and John Gray I added a little bit of information about PubSub and Blogdigger. Because I only tried these out today my comments on them are limited. They tackle different possibilities for feeds so I may have more to say about them as I get to use them more.

  7. Thanks for the link Scott, I’ve added it to the section about Feedster. There certainly is a lot information there, I’ll have to read over it a few times to digest it all.

  8. Sorry for coming late to the game here (I was out yesterday). Thanks for your comments about Bloglines. One correction: we do allow you to search on links, although it’s a seperate section than our keyword search. Go to http://www.bloglines.com/citations and type in a URL (or part of a URL). The search is very comprehensive, spanning the entire Bloglines archive, going back well over a year. Thanks!

  9. Perhaps Google and Yahoo aren’t researching blog searches because they know Blogging is a fad? Just a thought.

  10. Thanks for including Blogdigger in your review. Your assesment seems correct: we do need to increase our coverage (it’s on my list), and we do a number of things to reduce duplicate entries. Link Search is in the process of being overhauled. I like the media feeds as well, lots of good stuff comes through there!

    As for revenue, we have a partnership with Kanoodle for sponsored links on our search results page.

    Thanks again for this great review!

  11. I’m a developer working in the “feed search” space, and would like to mention Jyte. Jyte combines news aggregation, feed search, and state syncronization into a simple desktop application that works seamlessly across platforms and locations. Jyte removes the need for “middle-man” feed search sites like feedster and pubsub, by connecting directly to the Jyte feed search engine.

  12. Comment #20 from Brian is incorrect in its statements about PubSub.com. PubSub clients connect directly to the PubSub matching engine via either REST or the XMPP Instant Messaging protocol. For examples of this, download and try our IE add-in at http://pubsub.com/sidebar or download Gush, with PubSub support built-in from: http://2entwine.com/features/pubsub.html .

    Also, Joseph, PubSub currently gains revenues from specialized matching and subscription services that we provide to a small number of major clients. In the future, once the base technology issues are all worked out, we will be using advertising on the public service as well as offering for-pay subscriptions to “high-value” content.

    You asked: “why hasn’t Google or Yahoo come up with a way to integrate feed searches into their web searches in a meaningful way?” Well, it turns out that the technology needed to build a search engine for web searches is completely different from the technology needed to do feed searches. See my blog entry about “Retrospective vs. Prospective Search” at: http://bobwyman.pubsub.com/main/2004/04/retrospective_v.html

    bob wyman

  13. There’s also Daypop which has been around for a while, but I’m not sure how updated the feed list is. It only searches frequently updated feeds, but has other features like word bursts and the top 40 links (which is what I use it for.)

  14. Feed search is a relatively new phenomenon and is still yet to find an established powerful model. Your article is noteworthy for beginning a good discussion on this subject and for the fact that most of the feedsearch principals have tried to respond to this piece.

  15. I’m one of the creators of BlogPulse. The advanced search allows you to sort results by relevance first or by date first, and the index provides access to the past two months worth of posts.

  16. Just a quick post to introduce http://www.feedsfarm.com feeds search engine.

    Comments please :)

  17. Great post!

    Seems i found my destination here! I’m currently working on the feed search engine plazoo.com, which was not out when you wrote this article.
    Still, we miss some functionality you mentioned but we work hard to catch up and we got some nice stuff in work that you really will like i think.

    Best regards,

    Thorvald Kik

Leave a Reply

Your email address will not be published.

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

© 2014 Joseph Scott

Theme by Anders NorenUp ↑