[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sheflug] Fun With Regexp



On Sat, 2002-11-09 at 11:52, Richard Ibbotson wrote:
> The most common feature throughout the spam that I receive is that 
> there's a lot of HTML formatted mail.  Is there any way that I can 
> write a regex to cut this out ?

What tool do you intend to use? /<\/?[Hh][Tt][Mm][Ll][^>]*>/ would be an
obvious (mostly correct) test, but depends what tool you're using.

I personally use SpamAssassin, and find I don't get any false positives
(apart from spams people forward to me in the hopes I might find it
humourous :). It weights mails based on the presence of certain criteria
- like HTML mail - and if it passes a threshold you set, it gets marked
as spam:

SPAM: -------------------- Start SpamAssassin results 
SPAM: This mail is probably spam.  The original message has been altered
SPAM: so you can recognise or block similar unwanted mail in future.
SPAM: See http://spamassassin.org/tag/ for more details.
SPAM: 
SPAM: Content analysis details:   (12 hits, 5 required)
SPAM: Hit! (2.4 points)  'Message-Id' was added by a relay (2)
SPAM: Hit! (1.3 points)  'Received:' has 'may be forged' warning
SPAM: Hit! (0.6 points)  From: does not include a real name
SPAM: Hit! (1.5 points)  BODY: Asks you to click below
SPAM: Hit! (3.0 points)  URI: Uses a dotted-decimal IP address in URL
SPAM: Hit! (0.0 points)  BODY: Includes a URL link to send an email
SPAM: Hit! (3.2 points)  HTML-only mail, with no text version
SPAM: 
SPAM: -------------------- End of SpamAssassin results 

It then adds "*****SPAM******" to the subject line, and I can filter it
out in Evolution. You can mark spam in anyway you like though, even
invisibly with a mail header:

X-Spam-Status: Yes, hits=12.0 required=5.0
	tests=MSG_ID_ADDED_BY_MTA_2,MAY_BE_FORGED,NO_REAL_NAME,CLICK_BELOW,
	NORMAL_HTTP_TO_IP,MAILTO_LINK,CTYPE_JUST_HTML version=2.20

Which you can also filter out.

Cheers,

Alex.

Attachment: signature.asc
Description: This is a digitally signed message part