Sep 03 2008
Spam, it gets me down
I use an excellent collaborative spam filter on this blog. It harnesses the power of naive Bayesian classification to identify spam, and aggregates everyone’s spam to get lots of useful data. In short, it’s really good at identifying spam, for the same reason Gmail is really good — they have a large data set.
But once or twice a comment by a genuine person (as identified by my patented Genuine Person Detector™) has been dropped straight into the spam bin. So I check through the spam fairly regularly, unspamming the occasional good comment and deleting everything else. It takes just a few minutes, but it’s really beginning to get to me.
It’s not difficult to spot the spam. And I’m really glad the spam-blocker does such a good job. But it just depresses me terribly that I get all this stuff in the first place. Since starting this blog I have had 310 genuine comments and 4627 spam. The vast majority of that spam has been tediously obvious “hot girls! viagra! stock tips! buy buy buy!” links, hundreds of them in a single comment and no content whatsoever. Just ugly link dumps. I just get so fed up scrolling through these things, even though I know they’re already binned by the time I see them.
I wish there was a Neighbourhood Watch badge I could stick on my blog, to say “don’t spam here, we’ll catch you”. It’s selfish and silly, but I just get so beaten down by the endless stream of automated, transparent, stupid crap that flows through this blog.
I guess there is an interface issue as well. It would be easier to sort through the crap if the spam folder were augmented a bit:
Showing username, email address and website is enough to identify most spam. If there’s nothing obvious spammy in the comment itself then the submitted website will be a link to some dubious-sounding pharmaceuticals supplier.
In this case, an abbreviated spam page that just showed the meta-data (and maybe the first line of comment, like Gmail) would be quicker to skim through. I would get more spam on a single page too, so I wouldn’t have to click “Next page” if there’s a lot there.
Sorting by “spamminess” would be good too. I’m sure the spam blocker assigns a “probability of spamminess” to each comment. It would be nice if the “least spammy” comments were floated to the top of the page, so I could deal with them first and then kill the tedious stuff without looking.
I’ve submitted these suggestions to the developers. I would guess the first one is more likely to be followed through than the second. I get the impression that the Wordpress interface doesn’t actually know the probabilities, it only receives a thumbs up/down for each comment.
Comments Off