Electricmonk

Ferry Boender

Programmer, DevOpper, Open Source enthusiast.

Blog

Why spamfilters are useless

Wednesday, September 5th, 2007

I used to never get spam, until I made a little mistake in my mail client and sent mail to a mailinglist under my real email address. My address wound up on the big bad public internet, and a few hours later the first spam emails started to come in. I installed a quite sophisticated spamfilter to get rid of them, but it doesn’t work at all.

One of the most frequent spam mails I get is the following:

From: Euro VIP Casino
Subject: ontvang 400 Euro GRATIS als u lid wordt

   Voel de unieke opwinding van het spelen bij Europa's beste on-line
   casino... en ontvang EUR 400 GRATIS als u lid wordt
   ...

(Translation of subject: “Receive 400 Euro’s for FREE if you join now“)

The sender and the subject are always the same. Always. The body is also almost the same all the time. Only a single word is needed to identify this as spam 100% of the time: Casino. In total, there are five words that would always mark this as spam without any false-positives: Euro, VIP, Casino, 400 and GRATIS.

Yet the spamfilter, which is really quite sophisticated, still lets one through every now and then!

I’m not sure how the spamfilter works exactelly, but I do know it involves at least a bayesian filtering technique and some other clever tricks. But all I basically need is manual control over a blacklist and whitelist of words. I simply want to say: “Mark word Casino as spam” and I’ll be done with it.

I guess I’ll have to write my own additional filter.

The text of all posts on this blog, unless specificly mentioned otherwise, are licensed under this license.