Fixing Search Engines Won’t Stop Comment Spam

Spam is an unpleasant problem. It has managed to sink its claws into usenet and email to the point where I’ve more than once wondered why I still bother. In the last couple of years spam has entered into the world of blogs. One of the nice things about blogs is the increased interconnectivity (trackback, pingback, comments, feeds, etc), but it is these very same features that spammers are using to “advertise” their wares. Now that we’ve been through this fight a few times (and pretty much lost every time) there has been a lot of discussion about how to solve this problem before our blogs suffer the same fate as usenet.

Jeremy feels that comment spam could be fixed by search engines. The idea being that spammers hit blog comments in an effort to make themselves more visible to search engines and higher up in the results. I suppose on one level this is true, I’m sure that they would be thrilled if this result because it puts more eyeballs on their “advertisements”. Just like the fact that filtering spam won’t stop it from being sent to you, fixing pagerank and other search engine calculations won’t stop spammers from hitting blogs and hitting them hard.

Email spam has already proven this to be true. It wouldn’t matter that if 75% of all email accounts filtered spam with 100% accuracy (which they don’t by the way), spam would still be sent out to everyone (including those who filter it). All of these things bring us back to asking why. I suppose there are many answers to this question, but in the end I believe the simplest answer is: because they can. As long as the ability to spam email accounts exists, there will be those who are willing to do so. I believe the same will hold true for comment spam, as long as it can be done it will be, even if it doesn’t help their standings in search engines.

In the world of blogs though, comment spam is only one portion of the problem. I subscribe to a PubSub search for PostgreSQL. For the most part this is nice because when PostgreSQL gets mentioned in a feed that PubSub tracks it shows up in my subscription feed. This service suffers feed spam because PubSub can’t tell the difference between a feed entry written by a person who is really writing about PostgreSQL (or at least uses that word in the entry) and a bot (or person) who writes a spam entry and just happens to throw the term PostgreSQL in there so that it will show it places like my PubSub search. It’s hard for me to really blame PubSub since they are doing exactly what I asked of them, but it is annoying none the less.

Following down this path, if everything above is true, then how to we stop blog spam (either in comments, feeds, trackbacks, etc)? For now I believe there are ways in which we can try to maneuver around it, but as long as it is still possible it will continue. So if you are looking for techniques to fight blog spam, go for methods that prevent the spam from ever successfully entering your blog, otherwise you will still have to deal with the stuff.

4 thoughts on “Fixing Search Engines Won’t Stop Comment Spam”

  1. Note: The “PubSub LinkRank” in your Reference Search area should not include item specific data. LinkRanks only apply to sites. Thus, the correct link to your LinkRank would be:

    On the Spam problem… We feel your pain! In fact, we’re constantly trying to figure out ways to reduce the amount of spam we pass through the users. The problem, of course, is that it is very difficult for us to determine what is and is not spam. One thing you can do, however, to reduce the amount of spam you get is to use PubSub LinkRanks to filter the results you get. Typically, the spam comes from sites that nobody links to. Thus, if you say you only want to get data from sites that are in the “Top 50%” according to LinkRanks, you won’t see as much spam. Of course, you’ll be missing other “good” data as well. But, at least it is a start. Check out our weblogs subscription page at: . It should be fairly obvious how to use the drop-down list to filter your results. I’m sorry we’re not doing better — but we’re trying.

    bob wyman

  2. Bob –
    I didn’t mean to include linkrank for each post, just for the domain. I’ve removed that link for each post and left the correct one for the domain on the side. Thanks for catching this.

    As far as limiting my PubSub search, I’d opted for everything because I didn’t want to miss out on anything. I still feel that in the case of my PostgreSQL PubSub search I’d rather deal with the spam than miss out on a good post just because it came from an unpopular site. I doubt that I’m in the top 50% (my link rank has only been going red for the several weeks), so people would miss all of my posts. Maybe that’s a feature? :-)

    I don’t know that there is much PubSub can do about this sort of problem with reasonable accuracy (above 99.99%). If you were to simply stop following feeds that were known to be spam, they would just keep starting news ones.

  3. I think the problem with your PubSub search is that people put comments directly into their feeds, without moderating. You moderate because you want to protect your site from spam and trolls, but if you syndicate without moderating, this has no effect (as once an item is in BlogLines, or PubSub, it’ll be there forever)… perhaps you could blame the weblog owners for that one ;-)

  4. The hell of keywoards must be controlled, the Search engines more and more intellignet are detoured by more sophisticated false meta tages, the most I hate is the meta tage generated while searcing for a specific keywords, as soon as you’ve reached that site, you and the search engine are suprise that the content is far from your query such as you type : Tunisian blog directoty >>, the trick is genious but devil inside.

Leave a Reply

Your email address will not be published. Required fields are marked *