Web Hosting Forum | Lunarpages

Author Topic: How-to: Train SpamAssassin - Updated April 27, 2010  (Read 180510 times)

Offline pheared

  • Intergalactic Superstar
  • *****
  • Posts: 194
    • http://pheared.net
How-to: Train SpamAssassin
« Reply #30 on: September 22, 2004, 05:05:13 PM »
I suppose I can share.  You stick the following text in a file and run it in the python interpreter.  Note that python cares about indentation, so if copy and paste from here doesn't work, I can be pursuaded to upload the file somewhere.  With all of that setup, on Linux I just type ./spamdrain.py but YMMV.

Fill in your username, your password, and your domain as needed in the first couple of lines.

The code will skip mailboxes that are small (containing less than 30 messages) because they aren't a big deal and people who only get a couple of spams are more likely to have false positives in my experience.  It is set to purge mail that is 7 days old.  Both of these parameters is alterable.

Code: [Select]

#!/usr/bin/python                                                              
#                                                                              
# spamdrain -- a script that purges old e-mail in spam boxes                    
#              given a maximum age and a threshold                              
#                                                                              
# purgeimap was hacked to bits by Kevin Dwyer <kevin@pheared.net>
#   to create spamdrain.
# purgeimap was written By Justin R. Miller <justin@solidlinux.com>            
#                                                                              

import os, string, time, imaplib, sys

if __name__ == "__main__":
    server = "mail.yourdomain.com"
    port = 993
    username = "youruserid"
    password = "yourpassword"
    directory = "yourdomain.com/"
    folderMask = "*/spam"
    age = 7  # days                                                            
    purgeAmount = 30  # Only expunge a box if > purgeAmount                    

    timestamp = time.localtime(time.time() - (age * 86400))
    purgedate = time.strftime('%d-%b-%Y', timestamp)
    print "Destroying spam older than %s at threshold %i" % (purgedate,
                                                             purgeAmount)

    m = imaplib.IMAP4_SSL(server, port)
    m.login(username, password)

    spamboxesResp = m.list(directory, folderMask)
    spamboxes = map(lambda x:x.split()[3], spamboxesResp[1])
    #print spamboxes                                                            

    total = 0
    totalOld = 0
    totalExp = 0

    for box in spamboxes:
        print "Selecting %s..." % box,
        numMsgs = m.select(box)
        numMsgs = int(numMsgs[1][0])
        print "%i messages." % numMsgs
        typ, msgs = m.search(None, '(BEFORE ' + purgedate + ')')

        if numMsgs < purgeAmount:
            print "Skipping."
            continue

        for num in string.split(msgs[0]):
            m.store(num, '+FLAGS', '(\Deleted)')

        #print typ, msgs                                                        
        totalOld += len(msgs[0].split())
        total += numMsgs
        expunged = m.expunge()
        if expunged[1] != [None]:
            print "Expunged %s messages." % len(expunged[1])
            totalExp += len(expunged[1])

    m.logout()

    if total > 0:
        print "%i/%i (%.2f%%) spams over %i days old." \
              % (totalOld, total, (totalOld*100.0)/total, age)
    if totalOld > 0:
        print "%i/%i (%.2f%%) spams expunged." % (totalExp, totalOld,
                                                  (totalExp*100.0)/totalOld)


Offline Lopht

  • Intergalactic Cowboy
  • *****
  • Posts: 50
    • http://www.lopht.net
train script/function not seeing messages
« Reply #31 on: September 23, 2004, 03:47:07 PM »
Over the past couple of days I've noticed that regardless of how many messages are in the myham/myspam folders, when I run the script it says SA learned from zero out of one messages in each folder. I haven't modified the script since I first installed it when this thread began, and I haven't changed any permissions on any files. Anyone have any idea what might be going on?

Offline parish2

  • Spaceship Navigator
  • *****
  • Posts: 93
How-to: Train SpamAssassin
« Reply #32 on: September 25, 2004, 03:09:26 PM »
Forgive my ignorance, but how do you run a script in a Python interpreter?  Can I run it through Windows, or IE, or do I need some Python program to run it?

Offline kwdavids

  • Galactic Royalty
  • *****
  • Posts: 324
    • Netsmart Technologies
How-to: Train SpamAssassin
« Reply #33 on: October 11, 2004, 06:02:56 AM »
I find that over 90% of the spam we get scores 99% or higher on the Spam Assassin Bayesian filter.  This is very effective and well worth the effort.
Kevin

Offline parish2

  • Spaceship Navigator
  • *****
  • Posts: 93
How-to: Train SpamAssassin
« Reply #34 on: October 13, 2004, 08:35:03 AM »
I haven't been able to get this Python script to run, and I'm wondering if the cgi learning script that empties the myspam and myham folders can't be adapted to also empty all the other spam folders.

Offline krick

  • Intergalactic Cowboy
  • *****
  • Posts: 50
How-to: Train SpamAssassin
« Reply #35 on: November 18, 2004, 02:17:03 PM »
Quote from: w98
I'm still trying to determine whether LunarPages even uses our individual bayesian databases when forwarding Emails for our accounts. They DO use our personal user_prefs file (although custom rules seem to be ignored), but there are still a few unknowns as yet.


I'm not sure if we're talking about the same thing or not but I've found that if you use cPanel->Mail->Forwarders to forward from one account on your domain to another account on your domain, the email slips through SpamAssassin entirely for some reason.  I suspect that it has something to do with headers being changed by the forwarding process.  Or possibly the mail is passing because it seems to come from the same domain.

If you instead use cPanel->Mail->E-mail Filtering to forward your mail, the mail WILL get checked by SpamAssassin.  However if you have also set up the filter to automaticaly delete spam, all forwarded email that is spam will not be deleted.  This seems to be because a given email can only match ONE rule in the E-mail Filter list.  Once it matches something, it's not checked again.

Offline parish2

  • Spaceship Navigator
  • *****
  • Posts: 93
How-to: Train SpamAssassin
« Reply #36 on: November 19, 2004, 12:11:16 AM »
I can't get the filter to forward my emails.  Is there some trick to it?  I entered To = (address to be forwarded), the forwarding address down below... but it doesn't work.

Offline krick

  • Intergalactic Cowboy
  • *****
  • Posts: 50
How-to: Train SpamAssassin
« Reply #37 on: November 19, 2004, 03:45:36 PM »
Quote from: parish2
I can't get the filter to forward my emails.  Is there some trick to it?  I entered To = (address to be forwarded), the forwarding address down below... but it doesn't work.



The filter screen should look something like this...

Filter - [TO] - that - [CONTAINS] - me1@domain.com
Destination - me2@domain.com


I'm pretty sure you have to use "contains" rather than "equals" because with equals, the whole header entry has to match exactly.

Offline parish2

  • Spaceship Navigator
  • *****
  • Posts: 93
How-to: Train SpamAssassin
« Reply #38 on: November 20, 2004, 02:42:46 AM »
Hmm.  Still doesn't work.

Offline krick

  • Intergalactic Cowboy
  • *****
  • Posts: 50
How-to: Train SpamAssassin
« Reply #39 on: November 22, 2004, 07:50:06 PM »
Quote from: parish2
Hmm.  Still doesn't work.



Do you also have normal email forwarders set up for the addresses in question?   I think you can only have forwarders or filters, not both.

Offline parish2

  • Spaceship Navigator
  • *****
  • Posts: 93
How-to: Train SpamAssassin
« Reply #40 on: November 22, 2004, 10:47:24 PM »
No, I deleted those first.

Offline parish2

  • Spaceship Navigator
  • *****
  • Posts: 93
How-to: Train SpamAssassin
« Reply #41 on: November 23, 2004, 08:55:03 AM »
I see where they went, though: they get sent to the master account inbox for some reason, and marked as "spam".

Offline simeon

  • Newbie
  • *
  • Posts: 3
How-to: Train SpamAssassin
« Reply #42 on: November 26, 2004, 08:58:50 AM »
I just set up use and training of spam assassin as per this thread. I see no training happening at all.

I did not have a bayes_toks or bayes_seen files. running sa-learn did not create them, and even after manually creating blank place holder files, they do not get populated.

also sa-learn always says it learned from zero (0) messages and furthermore does not report the number of messages from the myspam or myham mailbox correctly. it either says zero or one.

any insight greatly appreciated..

Offline parish2

  • Spaceship Navigator
  • *****
  • Posts: 93
How-to: Train SpamAssassin
« Reply #43 on: December 02, 2004, 02:35:04 AM »
I removed the forwarder, added a "contains" filter, and tested it.  This is what I got:

  Filter Trace
 

Filter Trace Results:
Return-path copied from sender
Sender      = tellin2@quasor.lunarpages.com
Recipient   = tellin2@quasor.lunarpages.com
Testing Exim filter file "/etc/vfilters/tellingpictures.com"

Filtering did not set up a significant delivery.
Normal delivery will occur.

 


Quote from: krick
Quote from: parish2
Hmm.  Still doesn't work.



Do you also have normal email forwarders set up for the addresses in question?   I think you can only have forwarders or filters, not both.

Offline krick

  • Intergalactic Cowboy
  • *****
  • Posts: 50
How-to: Train SpamAssassin
« Reply #44 on: December 02, 2004, 09:57:30 AM »
Quote from: parish2
I removed the forwarder, added a "contains" filter, and tested it.  This is what I got:

  Filter Trace
 
<STUFF REMOVED>


How do you run a filter trace?

 

Share |