Why spamfilters are useless
Wednesday, September 5th, 2007
I used to never get spam, until I made a little mistake in my mail client and sent mail to a mailinglist under my real email address. My address wound up on the big bad public internet, and a few hours later the first spam emails started to come in. I installed a quite sophisticated spamfilter to get rid of them, but it doesn’t work at all.
One of the most frequent spam mails I get is the following:
From: Euro VIP Casino Subject: ontvang 400 Euro GRATIS als u lid wordt Voel de unieke opwinding van het spelen bij Europa's beste on-line casino... en ontvang EUR 400 GRATIS als u lid wordt ...
(Translation of subject: “Receive 400 Euro’s for FREE if you join now“)
The sender and the subject are always the same. Always. The body is also almost the same all the time. Only a single word is needed to identify this as spam 100% of the time: Casino. In total, there are five words that would always mark this as spam without any false-positives: Euro, VIP, Casino, 400 and GRATIS.
Yet the spamfilter, which is really quite sophisticated, still lets one through every now and then!
I’m not sure how the spamfilter works exactelly, but I do know it involves at least a bayesian filtering technique and some other clever tricks. But all I basically need is manual control over a blacklist and whitelist of words. I simply want to say: “Mark word Casino as spam” and I’ll be done with it.
I guess I’ll have to write my own additional filter.