[ÇлçÇÐÀ§Ãëµæ] ÇлçÇÐÀ§¸¦ ÃëµæÇÒ¼ö ÀÖµµ·Ï µµ¿Íµå¸³´Ï´Ù! Jason 29 Dec 2005

I'm thrilled with SpamSieve — it's 99.9% accurate. Since I started using it on Feb 16, 2005, SpamSieve caught 130,423 spam emails. Only 24 emails were incorrectly labeled as spam. Very impressive.

However, lately I've been getting emails with completely scrambled titles such as "[ÇлçÇÐÀ§Ãëµæ] ÇлçÇÐÀ§¸¦ ÃëµæÇÒ¼ö ÀÖµµ·Ï µµ¿Íµå¸³´Ï´Ù!" or "°¨»çµå¸®°í 2006³â »õÇØ¿¡µµ º¹¸¹ÀÌ ¹ÞÀ¸¼¼¿ä." These always get through.

I find it interesting that emails are obviously spam aren't marked as spam because the filter can't find any words or patterns to match up with the blacklist. It just shows the clear difference between man and machine. Just about any human could tell these are bogus emails, but my Powerbook with the latest G4, tons of RAM, and the most up to date software can't stop 'em.

Anyhow, just an observation. Nothing new.

Ryan Heneise 29 Dec 05

Thanks for that recommendation. I get an amazing amount of spam - although probably not as much as you. Still, a lot. I’m trying SpamSieve right now.

Peter Cooper 29 Dec 05

The sad thing is, it should be able to. Why isn’t detecting whether something is in English (or not) part of spam filters? I never want to read mail that isn’t in bona-fide English.

angelday true 29 Dec 05

I’ve never believed in client side filtering and dealing with virus/SPAM on my side, no way. I’ve discovered AlienCamel (http://aliencamel.com/) as an IMAP provider which has turned out to be an utter coolness.

Here is a shortlist for you guys to figure:

- unlimited (!) IMAP space
- SPAM and two level virus filtering
- webmail access

This is what the mail reading process looks like over there:

1. you get some mail
2. if sender is on your whitelist it IMMEDIATELY gets through
3. if sender is not on your whitelist it goes to the “pending mail” pool and gets evaluated whether it’s SPAM or not (the software just ranks shit)
4. you go in there make your changes (if there’s any, but most of the time it guesses correctly cause it uses the same Bayesian/SpamAssasin techniques everyone’s using)

Point is: shit never gets through. Worst case that can happen is that it filters some border-line shit as legit or vica-versa. This will, however, never leave your “pending mail” pool.

After a month or two most of your contacts will have been whitelisted and peace shall delve in the Universe once again.

PS: unlimited IMAP. And it’s cheap. No, I don’t work there, but they’re cool guys and I’ve been a customer since like a year.

PS2: sorry if this had been discussed here before

PS3: upon seeing this post’s subject in my newsreader I thought peace has left us — nice joke!

Fred 29 Dec 05

I set a couple two and three character (i.e. $%, Ï%$) type filters. It doesn’t catch them all but it gets 90% of them.

David Love 29 Dec 05

I had noticed that once I had full unicode support in both my spam filter and mail client (SpamAssasin and Mutt) - those jibberish titles started showing up as Kanji characters. Since I don’t get any legitimate mail that’s not in English, it’s pretty easy for the filter to learn (once it can mark the patterns). Does SpamSieve support Unicode?

Dante 29 Dec 05

That looks a little like raw gzip encoded data. In any case, it is truly amazing the lengths that these spammers will go to.

Dan Boland 29 Dec 05

The obvious ones are indeed a giant pain in the ass, but not as much as having to explain to someone who’s bitching about how much spam they get why an obvious one doesn’t get snagged by the filters.

Dave Woodward 29 Dec 05

“Powerbook with the latest G4, tons of RAM”

There is your problem right there. If you had a G5 or a fast x86 machine, these e-mails wouldn’t get through! (I kid, I have a Powerbook with a G4. I like it, its just isn’t exactly lightning fast like it seemed a few years ago).

David Russell 29 Dec 05

I almost didn’t read this article, because it looked like spam in my RSS reader. God forbid that ever actually happening.

Don Wilson 29 Dec 05

Just like the URL that this blog most makes “cccaaee_ccca_aeecooe_aoeie_iieu_” instead of the real characters that form the title.

Wendell 29 Dec 05

This article has totally jacked my webpage via the RSS feed.

Jake 29 Dec 05

I don’t know how easy SpamSieve is to configure, but couldn’t you set it to filter messages where, say, over 75% of the non-whitespace characters are not a-z/A-Z characters?

Tim Almond 29 Dec 05

What’s the view here of things like challenge-response systems?

Anyone used one, or do you shun them because of the concern of missing vital customers?

I’d be interested to know.

Scott Evans 29 Dec 05

I’ve always avoided making work for the sender; I wanted to make it easy for people to get in touch with me, and I’d deal with the spam problem on my side. I’m finally starting to change my mind. I get about 10,000 spams per month, and more are starting to leak through my (fairly well-tuned) SpamAssassin installation.

And I just don’t have time to deal with it.

So some kind of dumb challenge/response thing might be the ticket, even though it’s depressing.

Vishi 29 Dec 05

Machines are dumb.
Groups of people aren’t.

Will 30 Dec 05

Possibly it’s mis-encoded non-English text?

We use SpamAssassin. Non English email is automatically given a high spam score. I’ve never gotten spam like what you describe since turning on that setting.

hannah 30 Dec 05

I think this application is rocking. It’ll really benefit me.

eh 30 Dec 05

let this be a lesson in blog post naming: wouldn’t “garbage characters bypassing spam filters” or something thereabouts have been better?

Tom T 30 Dec 05

Odds are its one of the wave of Japanese “hookup” spam mails that have been flooding the internet lately (provided from spam mail servers based in China) that encourage you, a typical salaryman, to have affairs with willing 18 - 20 year old Japanese women. The letters are your program trying to interpret Kanji without knowing it’s Kanji.

I would have thought that a group of web saavy people such as yourselves would have figured that out by now.

A Noonie Moose 31 Dec 05

“Wendell 29 Dec 05
This article has totally jacked my webpage via the RSS feed.”


so blogosphere, what did we learn?

Wendell 31 Dec 05

A Noonie Moose said:
so blogosphere, what did we learn?

“We” learned to turn off the 37s RSS feed.

Lode 02 Jan 06

Tim Almond
Scott Evans asked about challenge/response systems.

Whats the view here of things like challenge-response systems?
Im (…) starting to change my mind.
So some kind of dumb challenge/response thing might be the ticket, even though its depressing.

To say it in Bush lingo: if you use challenge/response systems, then the spammers have already won.

I never, ever reply to such messages. Besides, most of them get marked as spam themselves, and that’s how I treat them.

Peter 03 Jan 06

“SpamSieve caught 130,423 spam emails. Only 24 emails were incorrectly labeled as spam. Very impressive.”

Unless you indicate the number of e-mails that were not labeled as spam, your readers can’t judge whether the above numbers are impressive. For instance, if that number was 10, then of the 34 non-spam mails, 24 were considered spam. I would consider that bad.

Ok, you probably had more then 34 non-spam mails.

Peter 03 Jan 06

Oh - and how many spam mails do you estimate did get through?

BillSaysThis 05 Jan 06

This post is getting annoying, it keeps showing as updated in the RSS feed. Could you see why and fix? Thanks.

Graham 06 Jan 06

Change your text encoding to Japanese (EUC) and that text almost sort of makes sense.

“Multi bodily organ sleeve sleeve valuable grandchild existence”

Or maybe not.

victor 21 Jan 06

actually, we have some credit for it, as language is our most complex and depured technology insofar, and the very base of culture. i guess the fact that a computer still can’t get it the way we do is perfectly understandable, given the circumstances. but a good bite for the brain nonetheless…

