←back to thread

429 points pabs3 | 2 comments | | HN request time: 1.295s | source
Show context
lisper ◴[] No.43472957[source]
I've been running my own spam filter for many years now based on this super-simple heuristic: My filter looks at my outgoing mail, and any mail received from an address I've sent mail to, or with a subject that has appeared in my outgoing mail (possibly with a "re:" prefix) is marked as non-spam. Everything else goes in spam, and any spam message from an address I've never received mail from before is marked as unread. I get hundreds of spams per day, but only about a dozen from new addresses. It takes me about ten seconds to scan them for non-spam cold calls, which are extremely rare. The other source of false positives is things like subscription confirmations, but because I know to expect those, they are always at the top of the spam folder.

I put this initial system in place expecting to have to augment it later with a more traditional content-based filter, but this simple heuristic works so well I've never felt the need to implement that additional step.

replies(3): >>43473299 #>>43473466 #>>43473483 #
kees99 ◴[] No.43473483[source]
I'm using something very similar, except incoming messages from never-seen-before senders are greylisted instead:

https://en.wikipedia.org/wiki/Greylisting_(email)

95% of spammers never retry.

replies(1): >>43473502 #
lisper ◴[] No.43473502[source]
The problem with greylisting is that it delays subscription confirmation emails when you sign up for a new service. I found that to be more trouble than it was worth. YMMV.
replies(1): >>43473725 #
1. kees99 ◴[] No.43473725[source]
For a greylisting that sends 451 before DATA, that is indeed a known problem.

My server sends 451 after DATA, and keeps a copy of greylisted message, as marked-as-read entry in separate folder. Those are deleted after few hours, or moved out after a successful delivery retry.

replies(1): >>43473851 #
2. lisper ◴[] No.43473851[source]
That's a good idea. I was using an off-the-shelf greylister that didn't work that way, but I might implement that strategy now that I'm doing everything myself.