Web Hosting Forum | Lunarpages

Learn More About Lunarpages Hosting => Web Hosting Tutorials, FAQs and Resources => Topic started by: w98 on April 08, 2004, 01:15:29 PM



Title: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: w98 on April 08, 2004, 01:15:29 PM
UPDATED: January 2014

v4.02 is released.

New in v4: much nicer HTML layout (bootstrap, nav menu with links to documentation, email and my new Google Helpout live support option)

I've built a "Build Your Own SpamAssassin Trainer" web app that you can use that will ask a few simple questions and generate everything you'll need for the Perl script. I'll be making other changes to the Perl script this spring that should improve performance.

Documentation: please visit http://iandouglas.com/spamassassin-trainer/

Build your trainer script here: http://iandouglas.com/sa-trainer/

Using the /sa-trainer/ link will let you configure your script in a web page using some simple prompts, and build a .zip file for you.


DISCLAIMERS:
Disclaimer #1: Being the guy that wrote this script back in 2001 and have been hacking at it ever since, and posting it here in April 2004, this script works amazingly well for me. Your mileage, of course, may vary.
Disclaimer #2: LunarPages has given me permission to post this information and quick start guide with with the following notes:
Quote
please include a warning that it is the user's own responsibility to mess with it :)
and
Quote
(paraphrased) Please announce that all LunarPages users should consider this message thread as the primary source of support for sa-trainer.cgi
I fully intend to hang out here (since I'm also a LunarPages user myself) to support this script till the end of time, so I'm happy to comply.
Disclaimer #3: While I have tried very hard to document this as carefully as I can and use 'best practice' software development efforts, some errors are bound to happen, so there are NO guarantees on these instructions whatsoever. However, numerous LunarPages users use my script on a regular basis and have seen dramatic drops in the amount of spam in their Inbox.
Disclaimer #4: The new script (starting at v3.02) and full supporting documentation is located at iandouglas.com (linked later within the quick-start guide for the download link, and linked again at the bottom of this message). LunarPages support staff have been awesome about letting me move the script out of this message thread (since the script is too big to fit in a single message here now). Just be aware that viewing the full documentation and downloading the script itself will take you away from LunarPages and LunarForums.com. Please do return here for support if you are a LunarPages user -- I promised LP that I'd always be available to this forum for assistance.[/i]

THE NEW sa-trainer.cgi QUICK-START GUIDE
Here are some very general instructions for how to set up SpamAssassin in CPanel and configuring the final details, downloading and installing the script, and getting it running. These instructions will teach you to do the following:
  • create an Email account called globalham@yourdomain.com for your users to forward their non-spam messages to
  • you will enable the CPanel "spam box" option, and scan each individual user's spam mailbox

Assumptions I Need to Make about You
If you want to take the simplest approach, and use the default behavior of this script:
  • that you know how to log in to CPanel using your LunarPages or other hosting account details
  • that you know how to create a new mailbox for your primary domain in CPanel
  • that you can save a copy of my script on your local computer, change it in a text editor like Notepad or TextEdit (not a word processor like MS Word), and save the file
  • that you know how to use an FTP program to upload a copy of the script to your hosting account
  • if you have multiple users with mailboxes through your account, that you can communicate effectively with your users to clean up their own mailboxes once you've finished running this training script
If you want to use more advanced features of this script:
  • that you can do all of the above, and know how to search through the configuration settings within the script to make changes to suit your needs
  • if you download your Email through a third-party software (Outlook, Outlook Express, Thunderbird, Eudora, etc) that you are familiar enough with that software to add an IMAP account or profile
  • or, if you always use webmail such as Squirrelmail or Horde, that you are familiar enough with using the software to move or copy messages to other folders

Terminology You Need to Learn
SPAM: unsolicited Emails that you've received that want you to buy something or contain adult-themed references that you'd rather not get anymore.
HAM: non-spam, legitimate Emails from friends, family, newsletters, and so on
SA: short for SpamAssassin
False-Positive: this is a non-spam (HAM) message that SA flagged as SPAM and ended up in our spam box.
False-Negative: this is a SPAM message that SA flagged as non-spam (HAM) that ended up in our Inbox
IMAP: this is an Email protocol used to send/receive Email messages from your hosting account. Generally, IMAP will leave a copy of downloaded messages on the server instead of downloading them to your computer and deleting the server's copy.
"spam/ham folder pair": this is a set of mail folders (which may actually be files instead of folders) that we will set up and use to store copies of messages to train SpamAssassin with.
primary domain: the first (or only) domain name configured for your CPanel account, not an add-on or parked domain added later.

NOTE: for all examples in the setup and the script itself, the account name I will use is myaccount. The primary domain for my CPanel account is mydomain.com. I'll do my best to keep these terms bolded throughout this text to highlight where you'll need to insert your own information.

Configure CPanel to turn SpamAssassin on

Login to your CPanel intferface, click on the 'Mail' icon, click on the link for 'SpamAssassin'.
Click on the 'Enable SpamAssassin' button, click on the 'go back' link.
Click on the 'Enable Spam Box', click on the 'go back' link.
Click on the 'go back' link again so you're back at the 'Mail' icon menu list where you clicked on 'SpamAssassin'
Click on 'Add/Remove/Manage Accounts'
Click on 'Add Account' link at the bottom
Set the Email account as 'globalham' at your primary domain name, set a password, and set a reasonable quota based on your usage, such as 100MB or 200MB. Click the 'Create' button
Click the 'Go Back' link

Create/Edit /home/myaccount/.spamassassin/user_prefs

In Cpanel, click on the File Manager icon
Click on the folder next to the ".spamassassin" folder link
If "user_prefs" doesn't already exist, click on the "Create New File" link, call the file "user_prefs" and specify that it is a Text Document, and click the Create button.
Click on the filename link for "user_prefs", and in the top-right corner of the screen, select to edit the file.
Replace the entire contents of the file with this text:
Quote
use_bayes   1
required_hits   3.5
rewrite_subject   1
subject_tag   {SPAM _SCORE(0)_}
bayes_path   /home/myaccount/.spamassassin/bayes
bayes_file_mode   0600
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_ignore_header X-MailScanner-Information
... be sure to replace "myaccount" with your actual CPanel username, and click the 'Save' button

Getting the sa-trainer.cgi Script
Build everything you need at http://iandouglas.com/sa-trainer/
This will take you away from LunarForums.com, but is the preferred method for getting a stable copy of the script. On that page, follow the instructions and download the .zip file it creates for you.

Upload sa-trainer.cgi to your hosting account
Use your favorite FTP program, upload it in ASCII mode into your /www/cgi-bin/ folder, and set the permission bits (chmod) to be 755. The script likely will not run without this.
*** I recommend renaming the script to some other name other than sa-trainer.cgi (like: my-spam-trainer.cgi or anything with a .cgi file extension) to avoid any security problems of people knowing you run this script in case any bugs are found that could be exploited (though I haven't found any myself, nor have any been reported to me, in the past three years).

If you do not have an FTP program, you can open the script in Notepad or TextEdit again, copy the entire contents to your clipboard, and do the following:
in CPanel, click on the File Manager icon
click on the yellow folder beside "public_html" or "www" (they both go to the same place)
click on the yellow folder beside "cgi-bin"
click on the link to "create new file"
In the top right corner of the screen, specify to create a new text document called "sa-trainer.cgi" (or some other filename to avoid any security issues) and click the Create button
In the new window that pops up, paste the contents of the script into the space provided, and click the 'save' button at the bottom, then close the pop-up window
Back in the File Manager window, click on the filename (which is a link) f or sa-trainer.cgi
Click on the link to set the permissions of the script, and select the 'execute' bit for all 3 columns so the permissions number reads '755' and click the 'change' button.

Have some recent spam/ham available to train with
Once you have some spam and ham messages available in the mailboxes you configured, simply call your script in your web browser, like "http://www.mydomain.com/cgi-bin/sa-trainer.cgi" (or whatever you called your copy of the script).

Ongoing maintenance
1. Teach your users to forward non-spam messages to globalham@mydomain.com, with a disclaimer that no human eyes will ever see the mailbox (you could be found liable for reading their private messages, so be sure you're not secretly peeking in there...). Instruct them not to forward messages over 100kb or with file attachments, as these can confuse SpamAssassin and slow down the scanning.

2. Once scanning is complete, empty the Inbox for the globalham@yourdomain.com account - the easiest and quickest way to avoid any legal/privacy concerns would be to completely delete the mailbox from CPanel and rebuild it.

3. You will also need to instruct your users to empty their spam boxes once scanning is complete. To do this, they can highlight/select all of their spam messages in the 'spam' folder, and use the delete function of their webmail/Email client software.

Did I forget anything?
Be sure to notify me if I've neglected to describe any step along the way.

Full Documentation
A MUCH larger version of this documentation is available at http://iandouglas.com/spamassassin-trainer/ (http://iandouglas.com/spamassassin-trainer/) You will probably need an hour or more to read through it (told you it was huge), but it goes much deeper into the configurable options of the script.

And as stated in a few other places here: if you are a LunarPages customer, this forum message thread that you're reading right now is your primary means of support for this script so please post messages here if you have questions or problems with the script.

Like it? Love it? Need extra help?
Checkout my live support option on Google Helpouts now: https://helpouts.google.com/113763167140406107715/ls/bbcf42fd8de12842 (https://helpouts.google.com/113763167140406107715/ls/bbcf42fd8de12842)

Good luck, and happy spam fighting!


Title: How-to: Train SpamAssassin
Post by: Danielle on April 08, 2004, 01:18:04 PM
Hi Ian,

I changed the thread from a normal post to a sticky since I think this is great.  I may also place it in the how-to section, since again this is great.  :thumb:

Have a Blessed Day


Title: How-to: Train SpamAssassin
Post by: w98 on April 08, 2004, 01:20:32 PM
w00t, many thanks :thumb:

-id


Title: How-to: Train SpamAssassin
Post by: Lopht on April 09, 2004, 06:37:48 AM
this rocks, thanks Ian.


Title: How-to: Train SpamAssassin
Post by: Lopht on April 09, 2004, 06:55:21 AM
one question, the paths put into the sa-learn.cgi reflect the account I use to log into cpanel with, correct? Do I have to have entried in this file for each mail account in the domain?

print `$salearn -p /home/lopht01/.spamassassin/user_prefs --mbox --spam --showdots /home/lopht01/mail/domain/user/myspam

print `$salearn -p /home/lopht01/.spamassassin/user_prefs --mbox --ham --showdots /home/lopht01/mail/domain/user/myham


Title: How-to: Train SpamAssassin
Post by: w98 on April 09, 2004, 09:19:36 AM
Heya Lopht,

Yes, you have that set up right, to train SA for spam/ham for various other mailboxes, that would work just fine.

As long as the user_prefs file points to your domain's bayes database files, it'll train everything into those databases. The down side with this, of course, is if one user thinks a message is SPAM while another thinks it's HAM, and they both try to train SA - SA will only remember the last way you trained it when it looked at a particular message.

So if user A and user B get a copy of message X from a newsletter, and user A trains it as SPAM and user B trains it as HAM, user A will continue to get the newsletter because user B re-trained SA using your overall domain bayesian database.

I don't know if we could set up SA for each individual Email account, that would take a lot of extra configuration, especially from LP's point of view, as well as using up a lot more of your disk quota for managing it on a per-user basis.

Just an extra $0.02. Thanks for the feedback.

-id


Title: How-to: Train SpamAssassin
Post by: Lopht on April 09, 2004, 09:38:32 AM
Fortunately I am the only 'user' on my domain. ;)

One thing I didn't see in the howto, was to give the cgi script execute permission.

One oddity, when I first ran it with 4 messages in the myspam folder and none in the myham folder, it said it learned from 5 messages in myspam and 1 in myham. I'll assume that's just the "this is administrative data for thie folder, do not delete" message that some readers don't display and some do?'

Is there a programmatic way to also empty those folders after processing them?


Title: How-to: Train SpamAssassin
Post by: w98 on April 09, 2004, 09:49:36 AM
Yeah, you *could* set the Perl script to remove the messages in there with a library like Mail::Box, that's how I did it on my system after processing the messages. The only 'gotcha' with that is that it's been known to totally erase the mailbox once it's deleted all of the messages, so you'd have to possibly find another Perl library to recreate the IMAP folder if it gets deleted. Although, just doing a 'touch' system call should create it ... hmm (pondering)

And yes, the extra message it counted is the message abpit "administrative data, do not delete"

I had edited the main message yesterday to include the 755 permission notes, but it's not there this morning, so I've added it again. :oops:

-id


Title: How-to: Train SpamAssassin
Post by: Lopht on April 09, 2004, 09:56:24 AM
heh, one last thing, the example URL you give to run it says sa-train.cgi, but everywhere else it says sa-learn.cgi.

A quick an dirty way to "empty" the mailbox would be

system ("echo \"\" > /home/lpaccount/mail/myham");
system ("echo \"\" > /home/lpaccount/mail/myspam");

Or if the admin data is really needed, just copy/paste it into the perl script as a multi-line string and echo that. No real need to bring in a whole module. ;)


Title: How-to: Train SpamAssassin
Post by: w98 on April 09, 2004, 09:58:28 AM
Oops, my bad. I started out calling it "train.cgi" then changed to "sa-learn.cgi", I'll update that now.

I tried the system call to echo a blank line into the mailboxes, but then it adds an empty message into each mailbox... SpamAssassin ignores it though, so I went ahead and added the two lines at the end of the script above... and gave credit where it's due :thumb:

-id


Title: How-to: Train SpamAssassin
Post by: w98 on April 09, 2004, 10:14:17 AM
Hmm... having the system call redirect a blank line to the mail folders seems to work despite a 600 permission on the mailbox file itself.

I got Perl to open the file and write nothing into it and then closing it ... by not using system("echo"), it doesn't write a blank message into the mailbox, effectively clearing it right out.

-id


Title: How-to: Train SpamAssassin
Post by: w98 on April 09, 2004, 10:24:00 AM
Okay, made a few edits to the script itself on a whole to use a $basepath variable and a $configfile variable to make the script quicker to edit for users.

-id


Title: How-to: Train SpamAssassin
Post by: w98 on April 10, 2004, 09:10:04 AM
One more thing to note if you feel that SpamAssassin is no better after following this how-to:

Bayesian filtering won't "kick in" until SpamAssassin sees that you've trained a 200 spam and 200 ham messages.

I'm still trying to determine whether LunarPages even uses our individual bayesian databases when forwarding Emails for our accounts. They DO use our personal user_prefs file (although custom rules seem to be ignored), but there are still a few unknowns as yet.

-id


Title: How-to: Train SpamAssassin
Post by: Lopht on April 11, 2004, 08:19:43 PM
I've noticed that since adding this, SA hasn't flagged a single message as spam, is that what you meant with the last note?  My bayes_toks file is almost 5 MB, so it is updating based on the spam/ham I'm giving it.


Title: How-to: Train SpamAssassin
Post by: w98 on April 11, 2004, 08:31:29 PM
Well, I'm waiting to hear back from Max now whether their server configurations tell SA to look at our personal bayes databases when delivering mail. But yes, that's what I meant. You have to train at least 200 ham and 200 spam for SA to kick in. Kind of a lot, I know, but WELL worth the effort once trained.

-id


Title: How-to: Train SpamAssassin
Post by: w98 on April 16, 2004, 10:29:34 AM
edited the how-to today to reflect the fact that everyone's user_prefs file is just full of comments and blank lines, and so set my instructions to replace the entire contents of their file with mine if their file is like that.

any line in user_prefs that starts with a '#' is just a comment marker - so the software will ignore everything after the '#' character if that's the first character it sees on a line.


Title: How-to: Train SpamAssassin
Post by: pheared on September 11, 2004, 09:29:30 AM
Here is my modified version.  It processes the "miss" and "ham" mailboxes of all of my users.  (I remember why I hate perl now. :P )
Code:

#!/usr/bin/perl

my $user = "username";
my $domain = "domainname";
my $salearn = "/usr/bin/sa-learn";
my $basepath = "/home/$user";
my $configfile = "$basepath/.spamassassin/user_prefs";
my $files = `find $basepath/mail/$domain/ -type f -name miss`;
my $hamfiles = `find $basepath/mail/$domain/ -type f -name ham`;
$|;

print "Content-type: text/plain\n\n";

print "Learning SPAM:\n";
print $files;
$files =~ s/\n/ /g;
print `$salearn -p $configfile --mbox --spam $files`;
print "\n\n";

print "Learning HAM:\n";
print $hamfiles;
$hamfiles =~ s/\n/ /g;
print `$salearn -p $configfile --mbox --ham $hamfiles`;
print "\n\n";

foreach $file (split(/ /, $files . $hamfiles)) {
        print "Cleaning out: $file\n";
        open (SPAM, "> $file");
        print SPAM 'From MAILER-DAEMON Sat Sep 11 09:53:05 2004
Date: 11 Sep 2004 09:53:05 -0700
From: Mail System Internal Data <MAILER-DAEMON@taurus.lunarpages.com>
Subject: DON\'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA
Message-ID: <1094921585@taurus.lunarpages.com>
X-IMAP: 1094921532 0000000001
Status: RO

This text is part of the internal format of your mail folder, and is not
a real message.  It is created automatically by the mail system software.
If deleted, important folder data will be lost, and it will be re-created
with the data reset to initial values.

';
        close SPAM;
}
exit;



Title: How-to: Train SpamAssassin
Post by: parish2 on September 14, 2004, 10:39:32 AM
I'm at the very beginning of the instructions.  I've enabled SPAM Assassin and Spam Box in my control panel.  When I go to Squirrel Mail, I see lots of mailboxes, but only one of them has a /spam extension.   It looks like Spam Box has only been enabled for one of the mailboxes (which happens to be mine... hmm.) I clicked to subscribe to this mailbox, and it moved into my folder list.  But I'm wondering about how to filter the other mailboxes on my domain for spam.


Title: How-to: Train SpamAssassin
Post by: pheared on September 14, 2004, 10:47:24 AM
The spam boxes are created as needed.  The accounts haven't received spam yet.


Title: How-to: Train SpamAssassin
Post by: parish2 on September 14, 2004, 10:24:28 PM
So this new IMAP folder I've got in my Outlook - I just want to be sure I understand what it is.  It seems to have all the incoming mail for all the mailboxes at my domain.  Do I assume that as long as they show up in this inbox folder, the various intended recipients will still be able to get them using their Outlook?


Title: How-to: Train SpamAssassin
Post by: TranzNDance on September 14, 2004, 10:33:07 PM
Oh, wow. I didn't know that setting the default account with IMAP would give it access to all the other accounts. :shock:

Anyway, to answer your question, parish2, as long as you do not delete the messages and purge, the recipients will have access to their mail.


Title: How-to: Train SpamAssassin
Post by: parish2 on September 14, 2004, 11:02:50 PM
OK, everything seems to be working.  I moved a bunch of spam into myspam.  They still show up in the IMAP inbox with a strikeout through them.  I ran the cgi, and that emptied the myspam folder, but what about the spam (with the strikeout) that is in the inbox?  Do I have to manually delete these?


Title: How-to: Train SpamAssassin
Post by: TranzNDance on September 15, 2004, 04:23:48 AM
You need to purge, which is like emptying the trash. It's under the Edit menu.


Title: How-to: Train SpamAssassin
Post by: parish2 on September 15, 2004, 04:30:28 AM
Excellent! Thank you.    :thumb:


Title: How-to: Train SpamAssassin
Post by: parish2 on September 17, 2004, 06:26:42 AM
When subscribing to spam mailboxes, per these instructions:

<<
- on the right side, you should see mailboxes for each Email accounts on your domain, like this: mydomain.com /joesmith/spam (assuming you have a valid mailbox for joesmith@mydomain.com)
- click on a spam box to subscribe to, and click the 'subscribe' button at the bottom of the list
>>

What about the mailbox simply called "spam" with no user mailbox prefix?  Do I subscribe to this as well?


Title: How-to: Train SpamAssassin
Post by: parish2 on September 18, 2004, 10:59:17 AM
Does the spam that is correctly identified and which Spam Assassin puts in the spam folder get purged automatically eventually, or should I be doing this?


Title: How-to: Train SpamAssassin
Post by: TranzNDance on September 18, 2004, 11:17:27 AM
No, it does not automatically get purged.


Title: How-to: Train SpamAssassin
Post by: parish2 on September 22, 2004, 08:41:11 AM
So are there any scripts that empty spam folders, or does the spam really just buld up until someone empties it or it bursts?  :-?


Title: How-to: Train SpamAssassin
Post by: pheared on September 22, 2004, 11:03:37 AM
I've written one that goes through and finds all of my user's spam folders, marking messages older than X days for deletion and expunging them, all through IMAP.


Title: How-to: Train SpamAssassin
Post by: parish2 on September 22, 2004, 04:49:42 PM
Sounds very useful.  Is it something you can share?  (And instruct me how to run?)


Title: How-to: Train SpamAssassin
Post by: pheared on September 22, 2004, 05:05:13 PM
I suppose I can share.  You stick the following text in a file and run it in the python interpreter.  Note that python cares about indentation, so if copy and paste from here doesn't work, I can be pursuaded to upload the file somewhere.  With all of that setup, on Linux I just type ./spamdrain.py but YMMV.

Fill in your username, your password, and your domain as needed in the first couple of lines.

The code will skip mailboxes that are small (containing less than 30 messages) because they aren't a big deal and people who only get a couple of spams are more likely to have false positives in my experience.  It is set to purge mail that is 7 days old.  Both of these parameters is alterable.

Code:

#!/usr/bin/python                                                              
#                                                                              
# spamdrain -- a script that purges old e-mail in spam boxes                    
#              given a maximum age and a threshold                              
#                                                                              
# purgeimap was hacked to bits by Kevin Dwyer <kevin@pheared.net>
#   to create spamdrain.
# purgeimap was written By Justin R. Miller <justin@solidlinux.com>            
#                                                                              

import os, string, time, imaplib, sys

if __name__ == "__main__":
    server = "mail.yourdomain.com"
    port = 993
    username = "youruserid"
    password = "yourpassword"
    directory = "yourdomain.com/"
    folderMask = "*/spam"
    age = 7  # days                                                            
    purgeAmount = 30  # Only expunge a box if > purgeAmount                    

    timestamp = time.localtime(time.time() - (age * 86400))
    purgedate = time.strftime('%d-%b-%Y', timestamp)
    print "Destroying spam older than %s at threshold %i" % (purgedate,
                                                             purgeAmount)

    m = imaplib.IMAP4_SSL(server, port)
    m.login(username, password)

    spamboxesResp = m.list(directory, folderMask)
    spamboxes = map(lambda x:x.split()[3], spamboxesResp[1])
    #print spamboxes                                                            

    total = 0
    totalOld = 0
    totalExp = 0

    for box in spamboxes:
        print "Selecting %s..." % box,
        numMsgs = m.select(box)
        numMsgs = int(numMsgs[1][0])
        print "%i messages." % numMsgs
        typ, msgs = m.search(None, '(BEFORE ' + purgedate + ')')

        if numMsgs < purgeAmount:
            print "Skipping."
            continue

        for num in string.split(msgs[0]):
            m.store(num, '+FLAGS', '(\Deleted)')

        #print typ, msgs                                                        
        totalOld += len(msgs[0].split())
        total += numMsgs
        expunged = m.expunge()
        if expunged[1] != [None]:
            print "Expunged %s messages." % len(expunged[1])
            totalExp += len(expunged[1])

    m.logout()

    if total > 0:
        print "%i/%i (%.2f%%) spams over %i days old." \
              % (totalOld, total, (totalOld*100.0)/total, age)
    if totalOld > 0:
        print "%i/%i (%.2f%%) spams expunged." % (totalExp, totalOld,
                                                  (totalExp*100.0)/totalOld)



Title: train script/function not seeing messages
Post by: Lopht on September 23, 2004, 03:47:07 PM
Over the past couple of days I've noticed that regardless of how many messages are in the myham/myspam folders, when I run the script it says SA learned from zero out of one messages in each folder. I haven't modified the script since I first installed it when this thread began, and I haven't changed any permissions on any files. Anyone have any idea what might be going on?


Title: How-to: Train SpamAssassin
Post by: parish2 on September 25, 2004, 03:09:26 PM
Forgive my ignorance, but how do you run a script in a Python interpreter?  Can I run it through Windows, or IE, or do I need some Python program to run it?


Title: How-to: Train SpamAssassin
Post by: kwdavids on October 11, 2004, 06:02:56 AM
I find that over 90% of the spam we get scores 99% or higher on the Spam Assassin Bayesian filter.  This is very effective and well worth the effort.


Title: How-to: Train SpamAssassin
Post by: parish2 on October 13, 2004, 08:35:03 AM
I haven't been able to get this Python script to run, and I'm wondering if the cgi learning script that empties the myspam and myham folders can't be adapted to also empty all the other spam folders.


Title: How-to: Train SpamAssassin
Post by: krick on November 18, 2004, 02:17:03 PM
Quote from: w98
I'm still trying to determine whether LunarPages even uses our individual bayesian databases when forwarding Emails for our accounts. They DO use our personal user_prefs file (although custom rules seem to be ignored), but there are still a few unknowns as yet.


I'm not sure if we're talking about the same thing or not but I've found that if you use cPanel->Mail->Forwarders to forward from one account on your domain to another account on your domain, the email slips through SpamAssassin entirely for some reason.  I suspect that it has something to do with headers being changed by the forwarding process.  Or possibly the mail is passing because it seems to come from the same domain.

If you instead use cPanel->Mail->E-mail Filtering to forward your mail, the mail WILL get checked by SpamAssassin.  However if you have also set up the filter to automaticaly delete spam, all forwarded email that is spam will not be deleted.  This seems to be because a given email can only match ONE rule in the E-mail Filter list.  Once it matches something, it's not checked again.


Title: How-to: Train SpamAssassin
Post by: parish2 on November 19, 2004, 12:11:16 AM
I can't get the filter to forward my emails.  Is there some trick to it?  I entered To = (address to be forwarded), the forwarding address down below... but it doesn't work.


Title: How-to: Train SpamAssassin
Post by: krick on November 19, 2004, 03:45:36 PM
Quote from: parish2
I can't get the filter to forward my emails.  Is there some trick to it?  I entered To = (address to be forwarded), the forwarding address down below... but it doesn't work.



The filter screen should look something like this...

Filter - [TO] - that - [CONTAINS] - me1@domain.com
Destination - me2@domain.com


I'm pretty sure you have to use "contains" rather than "equals" because with equals, the whole header entry has to match exactly.


Title: How-to: Train SpamAssassin
Post by: parish2 on November 20, 2004, 02:42:46 AM
Hmm.  Still doesn't work.


Title: How-to: Train SpamAssassin
Post by: krick on November 22, 2004, 07:50:06 PM
Quote from: parish2
Hmm.  Still doesn't work.



Do you also have normal email forwarders set up for the addresses in question?   I think you can only have forwarders or filters, not both.


Title: How-to: Train SpamAssassin
Post by: parish2 on November 22, 2004, 10:47:24 PM
No, I deleted those first.


Title: How-to: Train SpamAssassin
Post by: parish2 on November 23, 2004, 08:55:03 AM
I see where they went, though: they get sent to the master account inbox for some reason, and marked as "spam".


Title: How-to: Train SpamAssassin
Post by: simeon on November 26, 2004, 08:58:50 AM
I just set up use and training of spam assassin as per this thread. I see no training happening at all.

I did not have a bayes_toks or bayes_seen files. running sa-learn did not create them, and even after manually creating blank place holder files, they do not get populated.

also sa-learn always says it learned from zero (0) messages and furthermore does not report the number of messages from the myspam or myham mailbox correctly. it either says zero or one.

any insight greatly appreciated..


Title: How-to: Train SpamAssassin
Post by: parish2 on December 02, 2004, 02:35:04 AM
I removed the forwarder, added a "contains" filter, and tested it.  This is what I got:

  Filter Trace
 

Filter Trace Results:
Return-path copied from sender
Sender      = tellin2@quasor.lunarpages.com
Recipient   = tellin2@quasor.lunarpages.com
Testing Exim filter file "/etc/vfilters/tellingpictures.com"

Filtering did not set up a significant delivery.
Normal delivery will occur.

 


Quote from: krick
Quote from: parish2
Hmm.  Still doesn't work.



Do you also have normal email forwarders set up for the addresses in question?   I think you can only have forwarders or filters, not both.


Title: How-to: Train SpamAssassin
Post by: krick on December 02, 2004, 09:57:30 AM
Quote from: parish2
I removed the forwarder, added a "contains" filter, and tested it.  This is what I got:

  Filter Trace
 
<STUFF REMOVED>


How do you run a filter trace?


Title: How-to: Train SpamAssassin
Post by: parish2 on December 02, 2004, 01:25:05 PM
There's a "test filter" window on the filter screen with instructions.


Title: How-to: Train SpamAssassin
Post by: krick on December 03, 2004, 01:59:46 PM
Quote from: parish2
There's a "test filter" window on the filter screen with instructions.


You know, I never even paid attention to that.  DUH!

Anyway, I'm not sure what the error message means.

What does your filter look like?


Title: How-to: Train SpamAssassin
Post by: parish2 on December 04, 2004, 10:52:38 AM
header "contains" "jf@tellingpix.com" gets delivered to "jf@tellingpictures.com"


Title: Re: How-to: Train SpamAssassin
Post by: krick on December 05, 2004, 12:26:20 PM
Quote from: w98


1. Set up IMAP folders to hold spam and ham messages

...

- click on the 'webmail' icon
- click on the 'squirrelmail' link
- click on the 'folders' link



I've gotten up to this point.  However, when I click on the "squirrelmail" link, I get an error page "ERROR  Unknown user or password incorrect"

I'm stuck at this page and cannot get any further.


Title: How-to: Train SpamAssassin
Post by: krick on December 05, 2004, 12:33:03 PM
Quote from: parish2
header "contains" "jf@tellingpix.com" gets delivered to "jf@tellingpictures.com"


Seems like it should work to me.  I don't understand why it doesn't.

However, you should be aware that I just found a problem with using "contains" instead of "equals".

It seems that when someone sends an email that includes multiple people on the TO line, and all the people are on the same domain, all copies will be caught by the filter.

for example...

given this filter:

TO contains "user1@domain.com" destination "user8@domain.com"

and an email sent with user1@domain.com, user2@domain.com, and user3@domain.com all on the TO line.

Three copies are sent to the domain.com mail server, one for each person on the domain.  But all three copies match the filter and end up going to user8@domain.com


Title: generic inbox
Post by: parish2 on December 06, 2004, 03:52:30 AM
I've subscribed to the IMAP accounts for each of the mailboxes I've set up, as well as the spam mailboxes.  But my account has a generic "inbox" which seems to be full of spam.  Is this mail that was sent to my domain without a specific address?  If so, how do I avoid receiving these?


Title: How-to: Train SpamAssassin
Post by: TranzNDance on December 06, 2004, 12:34:40 PM
In your default address settings in cpanel under Mail, you can change it to go to :blackhole:


Title: How-to: Train SpamAssassin
Post by: parish2 on December 06, 2004, 12:56:11 PM
That's great, thank you.  :yey:


Title: How-to: Train SpamAssassin
Post by: TranzNDance on December 06, 2004, 12:57:54 PM
You're welcome. :)


Title: Re: How-to: Train SpamAssassin
Post by: krick on July 07, 2005, 10:47:03 AM
Can someone help me modify the sa-learn.cgi so that it takes the account name as a parameter?

I installed Ilohamail in my account and I'm adding a link in the Ilohamail toolbar that will call the sa-learn.cgi with the user's account name.  That way, any user can move something into their spam folder and then make SpamAssassin learn from it.

The I'm not that familiar with Perl and I'm not sure how to go about modifying the cgi.

Any help will be appreciated.  Thanks


Title: Re: How-to: Train SpamAssassin
Post by: w98 on July 07, 2005, 11:09:20 AM
Can someone help me modify the sa-learn.cgi so that it takes the account name as a parameter?

Code:
#!/usr/bin/perl
# modified code based on ian's sa-learn.cgi
# freely distributed to anyone, by ian douglas, id@w98.us
# no warranty whatsoever on this, don't blame me for using this

use CGI::Carp qw(fatalsToBrowser);
use CGI ;
my $q = new CGI ;

my $LPaccount = "blah3" ; #replace this with your actual account name

my $basepath = "/home/".$LPaccount ;
my $domain = "mydomain.com" ; # replace this with your actual domain name
my $emaillogin = $q->param('loginname') ;

my $salearn = `which sa-learn` ;
chop($salearn) ;
my $configfile = "$basepath/.spamassassin/user_prefs" ;
$| ;

print "Content-type: text/plain\n\n" ;

if (!$emaillogin) {
  print "You must supply a login id in the 'loginname' parameter!\n" ;
  exit ;
}

print "Learning SPAM:\n" ;
print `$salearn -p $configfile --mbox --spam $basepath/mail/$domain/$emaillogin/myspam` ;
print "\n\n" ;

print "Learning HAM:\n" ;
print `$salearn -p $configfile --mbox --ham $basepath/mail/$domain/$emaillogin/myham` ;
print "\n\n" ;

  if ( -e "$basepath/mail/$domain/$emaillogin/myspam")
  {
    print "Emptying $basepath/mail/$domain/$emaillogin/myspam\n" ;
    open (SPAM, "> $basepath/mail/$domain/$emaillogin/myspam") ;
    print SPAM "" ;
    close SPAM ;
  }
  if ( -e "$basepath/mail/$domain/$emaillogin/myham")
  {
    print "Emptying $basepath/mail/$domain/$emaillogin/myham\n" ;
    open (SPAM, "> $basepath/mail/$domain/$emaillogin/myham") ;
    print SPAM "" ;
    close SPAM ;
  }

I haven't tested this, but it *should* work okay. My perl installation doens't report any syntax problems.

Use:
http://www.yourdomain.com/cgi-bin/sa-learn.cgi?loginname=joesmith

Note that this will require you to have a 'myspam' and 'myham' folder for the 'joesmith' user. Remember that SpamAssassin needs a balance of non-spam (aka 'ham') messages to learn from as well.

Contact me if you need any more help with this.
--
Ian Douglas


Title: Re: How-to: Train SpamAssassin
Post by: krick on July 07, 2005, 11:15:20 AM

Note that this will require you to have a 'myspam' and 'myham' folder for the 'joesmith' user.


It would be convenient if it could create those folders if they didn't exist.



Remember that SpamAssassin needs a balance of non-spam (aka 'ham') messages to learn from as well.


Really?  It can't learn from just the spam messages?  I've been doing this for a while with just spam (no ham) and it seemed like it was working.  What happens if you don't feed it any ham?



Title: Re: How-to: Train SpamAssassin
Post by: w98 on July 07, 2005, 11:33:17 AM
It would be convenient if it could create those folders if they didn't exist.

Yes it would ... although that's already covered in the instructions on page 1 and somewhat outside the scope of this script -- this script is to learn from spam/ham folders that *already* exist.

Plus I've never looked to see if LP has the necessary libraries installed to create the folders via IMAP instructions for a given user account.

It can't learn from just the spam messages?  What happens if you don't feed it any ham?

You'll end up with fewer "false positives" if you teach SpamAssassin what you consider HAM to be. If you go to work at a bank, they'll train you on what a *real* dollar bill looks like so you can better detect a counterfeit... SpamAssassin needs a balance so it learns more intelligently.

Also, try not to stockpile spam/ham as the learning algorithm in SpamAssassin can miscount the number of messages in the mailbox it's learning from. It's not uncommon for me to load up a mailbox with 200 messages and have my script report that it scanned 30 messages. I've found that keeping the myham/myspam mailboxes under 60 messages works best (and faster obviously) which means you'll also have more accurate scanning.

The only hitch with training ham/spam from individual Email mailboxes though, is that if one user trains a message as spam, and the next user who also got the same message trains it as ham, it will confuse SpamAssassin since each LP account only gets a single SpamAssassin database. If I recall from old documentation, if SA learns a message as spam, and then relearns the message as ham, it will then classify subsequent messages like it as whatever it learned *last* ... so if all 50 of your users get a message and the first 49 train it as spam and the last user trains it as ham, everyone else's "spam" designation was overridden.

ian


Title: Re: How-to: Train SpamAssassin
Post by: krick on July 07, 2005, 11:43:50 AM

It can't learn from just the spam messages?  What happens if you don't feed it any ham?

You'll end up with fewer "false positives" if you teach SpamAssassin what you consider HAM to be. If you go to work at a bank, they'll train you on what a *real* dollar bill looks like so you can better detect a counterfeit... SpamAssassin needs a balance so it learns more intelligently.

Also, try not to stockpile spam/ham as the learning algorithm in SpamAssassin can miscount the number of messages in the mailbox it's learning from. It's not uncommon for me to load up a mailbox with 200 messages and have my script report that it scanned 30 messages. I've found that keeping the myham/myspam mailboxes under 60 messages works best (and faster obviously) which means you'll also have more accurate scanning.


Hmm...  Would it hurt to have the script scan the user's inbox as "ham"?  Obviously, I'd have to modify the script so it doesn't empty their inbox, of course.

The way I use it now, I only have one special folder, "spam".  When I get spam, I move it into that folder, then make spam assasin learn from it.  As you say, I need to balance that out with "ham" but the only "ham" I have is the contents of my inbox.  I'd rather not have to move messages into another folder to have it learn, then move them back.

Note that I'm NOT using IMAP.  I use webmail during the day at work and download my email at night using my POP3 client.  I usually have less than 100 messages in my inbox at any given time.  I only train spam assasin during the day at work from my webmail client.

Also note that when I say "user" I'm only talking about 5 people, my family members who are using webmail accounts on my domain.  I doubt my user base will ever grow past 10 people.


Title: Re: How-to: Train SpamAssassin
Post by: w98 on July 07, 2005, 11:48:34 AM
Hmm...  Would it hurt to have the script scan the user's inbox as "ham"?

... unless they have spam in their inbox, no, it won't hurt anything at all.

In the past, I've made a spam/ham folder setup for each client Email address I hosted, and told them to *copy* any messages that they considered 'ham' into the 'ham' folder and assured them that only my script would ever see what was in there.

Quote
I'd rather not have to move messages into another folder to have it learn, then move them back.

... which is why you just *copy* the messages there :)

ian


Title: Re: How-to: Train SpamAssassin
Post by: krick on July 07, 2005, 12:19:27 PM
It would be convenient if it could create those folders if they didn't exist.

Yes it would ... although that's already covered in the instructions on page 1 and somewhat outside the scope of this script -- this script is to learn from spam/ham folders that *already* exist.

Plus I've never looked to see if LP has the necessary libraries installed to create the folders via IMAP instructions for a given user account.


I just removed the folder check...

if ( -e "$basepath/mail/$domain/$emaillogin/myspam")

...and it creates the folder if it doesn't exist.


Thanks for all your help.  It all works like a charm now.  You da man!    :thumb:


Title: Re: How-to: Train SpamAssassin
Post by: w98 on July 14, 2005, 11:56:02 AM
Update July 14, 2005:

I've modified the script back on page 1 quite a bit and updated the notes a little.

The new script will scan your /mail/ directory and recursively scan each mailbox in there to detect ham/spam mailboxes and delete any messages in your ham/spam folder pair after scanning.

Therefore, running:
http://www.yourdomain.com/cgi-bin/sa-learn.cgi
will now scan multiple mailboxes on the fly, and erase the messages in ham/spam after scanning.

The URL has three possible parameters:
account
xham
xspam

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?account=joesmith
When 'account' is added to the URL line, the script will only scan joesmith's mailbox for ham/spam folders; the absence of the 'account' parameter will scan ALL mailboxes on the domain by default

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?xham=0
By default, xham gets internally set to '1' if not set on the URL, which flags the script to empty the ham folders of any messages after scanning. By setting this flag to a 0, you can ensure that the ham messages will remain intact after scanning.

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?xspam=0
Likewise. xspam gets internally set to '1' if not set on the URL, which flags the script to empty the spam folders of any messages after scanning. I can't imagine too many scenarios where you'd want to set this to a 0, but provide for it just in case.

Here's an example of scanning joe smith's mailbox and keeping his ham:
http://www.yourdomain.com/cgi-bin/sa-learn.cgi?account=joesmith&xham=0

Feel free to comment, make improvements, or report bugs by replying here in the forums.



Title: Re: How-to: Train SpamAssassin
Post by: agesixracer on August 16, 2005, 10:32:35 AM
Update July 14, 2005:

I've modified the script back on page 1 quite a bit and updated the notes a little.

The new script will scan your /mail/ directory and recursively scan each mailbox in there to detect ham/spam mailboxes and delete any messages in your ham/spam folder pair after scanning.

Therefore, running:
http://www.yourdomain.com/cgi-bin/sa-learn.cgi
will now scan multiple mailboxes on the fly, and erase the messages in ham/spam after scanning.

The URL has three possible parameters:
account
xham
xspam

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?account=joesmith
When 'account' is added to the URL line, the script will only scan joesmith's mailbox for ham/spam folders; the absence of the 'account' parameter will scan ALL mailboxes on the domain by default

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?xham=0
By default, xham gets internally set to '1' if not set on the URL, which flags the script to empty the ham folders of any messages after scanning. By setting this flag to a 0, you can ensure that the ham messages will remain intact after scanning.

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?xspam=0
Likewise. xspam gets internally set to '1' if not set on the URL, which flags the script to empty the spam folders of any messages after scanning. I can't imagine too many scenarios where you'd want to set this to a 0, but provide for it just in case.

Here's an example of scanning joe smith's mailbox and keeping his ham:
http://www.yourdomain.com/cgi-bin/sa-learn.cgi?account=joesmith&xham=0

Feel free to comment, make improvements, or report bugs by replying here in the forums.



when i run the sa-learn.cgi file it doesn't seem to be checking each mailbox.  instead of getting

Code:
SpamAssassin version 3.0.4
using /usr/bin/sa-learn in /home/lpaccount/mail/mydomain.com (login: all) to learn about spam/ham
Checking /home/lpaccount/mail/mydomain.com/jim.smith for spam/ham:
Learning SPAM:
Learned from 14 message(s) (18 message(s) examined).
Learning HAM:
Learned from 9 message(s) (5 message(s) examined).


Checking /home/lpaccount/mail/mydomain.com/john.doe for spam/ham:
Learning SPAM:
Learned from 23 message(s) (29 message(s) examined).
Learning HAM:
Learned from 35 message(s) (48 message(s) examined).



i get:



Code:
SpamAssassin version 3.0.3
using /usr/bin/sa-learn in /home/lpaccount/mail/mydomain.com (login: all) to learn about spam/ham
Checking /home/lpaccount/mail/mydomain.com/all for spam/myham:
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myham
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myspam


Checking /home/inside16/mail/mydomain.com/all for spam/myham:
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myham
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myspam


Title: Re: How-to: Train SpamAssassin
Post by: w98 on August 16, 2005, 10:58:10 AM
when i run the sa-learn.cgi file it doesn't seem to be checking each mailbox.
i get:

Code:
SpamAssassin version 3.0.3
using /usr/bin/sa-learn in /home/lpaccount/mail/mydomain.com (login: all) to learn about spam/ham
Checking /home/inside16/mail/mydomain.com/all for spam/myham:
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myham
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myspam


Checking /home/inside16/mail/mydomain.com/all for spam/myham:
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myham
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myspam

How exactly are you running the script? Copy the full URL into a message here, masking out your domain name, like this:

   http://www.********.com/cgi-bin/sa-learn.cgi ...

and be sure to list all of the parameters you're passing. Obviously without seeing the exact code of the script you've uploaded to your server, I cannot guarantee that this will operate as you expect it to.

The script defaults to scanning all mailboxes by looking at the $emaillogin variable which is initialized with the word "all". If you've specified an "account" parameter when running the script (ie: http://blah.com/cgi-bin/sa-learn.cgi?account=joeuser) then $emaillogin contains "joeuser" instead of "all" (it gets overwritten.

Later in the code, there's a logic check to see if $emaillogin equals "all" or not. If not, it calls the "dospam" routine on the single mailbox. Otherwise, it goes through other code to detect every mailbox on the domain, and scan each of them for ham/spam.

This script works just fine for me on 4 different sites; the only modification is the name of the domain and LP username. If you want me to look at it for you, contact me in a private message and I'll be happy to take a peek at it.


Title: Re: How-to: Train SpamAssassin
Post by: w98 on August 16, 2005, 12:11:09 PM
Okay, we found a bug.

I made a modification to the script back on page one to alter the code from this:
Code:
foreach my $login (sort @logins) {
  my $newpath = "$basepath/$emaillogin" ;
  &dospam($newpath,$config,$clearham,$clearspam) ;
}
to this:
Code:
foreach my $login (sort @logins) {
  my $newpath = "$basepath/$login" ;
  &dospam($newpath,$config,$clearham,$clearspam) ;
}

The $newpath variable was getting loaded up with $emaillogin, which in the case of dealing with "all" mailboxes, was putting the word "all" in the $newpath, instead of the $login variable parsed from the foreach() call. Sorry about that.


Title: Re: How-to: Train SpamAssassin
Post by: purefusion on September 06, 2005, 06:07:02 AM
By default, I am missing my bayes_toks and bayes_seen files. If I just create them, will everything work as normal?

As of now, when scanning myham/myspam, it claims no files are being learned from, and no messages examined, though there are clearly spam messages in the myspam folder that got past the spam filter.

:(


Title: Re: How-to: Train SpamAssassin
Post by: krick on September 29, 2005, 10:02:37 AM
Update July 14, 2005:

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?account=joesmith
When 'account' is added to the URL line, the script will only scan joesmith's mailbox for ham/spam folders; the absence of the 'account' parameter will scan ALL mailboxes on the domain by default

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?xham=0
By default, xham gets internally set to '1' if not set on the URL, which flags the script to empty the ham folders of any messages after scanning. By setting this flag to a 0, you can ensure that the ham messages will remain intact after scanning.

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?xspam=0
Likewise. xspam gets internally set to '1' if not set on the URL, which flags the script to empty the spam folders of any messages after scanning. I can't imagine too many scenarios where you'd want to set this to a 0, but provide for it just in case.

Feel free to comment, make improvements, or report bugs by replying here in the forums.

I think it should always default to *not* deleting anything unless explicitly instructed to for safety reasons.

"account" should have to be set to "all" in the URL in order to check all folders and check no folders by default.

then have two parameters... "deletespam" and "deleteham" that have to be set to 1 or "true" in the URL to delete files (defaults to 0 or "false" if not present)

The reason is that if while setting things up, someone screws up while configuring their paths somewhere, a lot of stuff could get deleted and this would be very bad.

Just my $.02



Incidentally, I'm still using the earlier version of the sa-learn-user.cgi where it only takes a "user" parameter.  I modifed ilohamail to include a "learn-spam" link at the top.  That way, when I'm logged into my webmail, I can select spam messages, move them to the spam folder, then hit "learn spam".  This learns the inbox as the "ham" folder, learns the spam folder, and deletes the contents of the spam folder.

Using this method, each user can learn their own spam and ham and train SpamAssassin.

If anybody is interested, here's the code I added to Ilohamail 0.8.14 RC3 in
/source/tool.php  .....

Code:
$links[] = array("prefs.php?user=$user", "list2", $toolStrings["prefs"], $div);

// BEGIN ADDED CODE
$LPtemp = explode("@", $loginID);
$LPuser = $LPtemp[0];
$links[] = array("http://domain.com/cgi-bin/sa-learn-user.cgi?user=$LPuser",
"_blank", "Learn Spam", $div);
// END ADDED CODE

echo "\n<form method=POST action=\"main.php\" target=\"list2\">\n";


It works good but I have two little problems:

1) I have to hard code my domain into the URL.  I'm not sure how to get my domain or my IP address at that spot via PHP.

2) the way I get the user name is kinda hacky.  There must be a better way.


Title: Re: How-to: Train SpamAssassin
Post by: w98 on October 13, 2005, 03:14:03 PM
You could use $_SERVER['http_host'] to get your domain name in PHP.


Title: Re: How-to: Train SpamAssassin
Post by: bxb13 on October 23, 2005, 02:22:39 PM
Hi. I'm a newbie at LP, but I just set up IMAP email w/ Spam Assassin. I also used this guide to set up the CGI script to train SA. I'm running into a problem, though, where SA training script says it only looked a 1 message and learned from 0 messages. This happens for both the SPAM and HAM folders, and it makes no difference how many messages got copied to these folders. I tried both the opriginal script on the 1st page, and the w98 modified version, but I get the same results. Are there any logs that I can look to see if the script is running into problems? Or is there any other way to debug the script?

Any help would be greatly appreciated.
Thanks


Title: Re: How-to: Train SpamAssassin
Post by: krick on October 24, 2005, 10:29:32 AM
You could use $_SERVER['http_host'] to get your domain name in PHP.

Actually, I ended up using $SERVER_ADDR to get the ip address and it worked like a charm.



Title: Re: How-to: Train SpamAssassin
Post by: spatters1000 on October 31, 2005, 01:15:50 PM
Hi. I'm a newbie at LP, but I just set up IMAP email w/ Spam Assassin. I also used this guide to set up the CGI script to train SA. I'm running into a problem, though, where SA training script says it only looked a 1 message and learned from 0 messages. This happens for both the SPAM and HAM folders, and it makes no difference how many messages got copied to these folders. I tried both the original script on the 1st page, and the w98 modified version, but I get the same results. Are there any logs that I can look to see if the script is running into problems? Or is there any other way to debug the script?

Any help would be greatly appreciated.
Thanks

I'm a newbie too and I just set up SA and the training script. I had a similar problem until I changed the permissions on the new sa-learn.cgi file. The instructions on page one of this thread state to change the permission to 755. I tried typing in 755 but it wouldn't take it. I finally realized that you need to check the three "execute" boxes and it changes the values for you. To do this, hightlight the sa-learn.cgi file, click Change Permissions. (Initially, the permission value is 644. When you check the three "execute" boxes the value changes to 755. Click "Change" to save the new value. After I did this it worked. I realized the executable permission was the problem when I checked the error log. Access it via cPanel. There's an icon labeled "Error Log".

Another quirk I see is when I run the cgi script (http:www.domain.com/cgi-bin/sa-learn.cgi). If I open a browser window, enter the URL for the script, run it, make a change to the number of email messages in the spam/ham folder (for example), and then try to run the script again, it only repeats the values shown in the first run. This was also the case when I was getting no results before I changed the permission value. I found that if I close the browser window and then open a new one and run the script it will reflect the updated values. I couldn't get it to update if I merely refreshed the page, or if I clicked "Go" on the address bar.

One other puzzling thing -- I set up an IMAP account in Outlook. I can see Inbox, Inbox.Drafts, Inbox.Sent, Inbox.Trash, and Junk E-mail. I don't see the "myham" and "myspam" folders, even though they exist. I can see them when I access the account via Webmail. Also, I'm assuming Outlook created the Junk Email folder since that one doesn't exist in the Webmail view.

Anyone have thoughts/suggestions on the Outlook issue? Thanks!


Title: Re: How-to: Train SpamAssassin
Post by: grof on November 05, 2005, 04:21:28 PM
This is a great resource for SA !  Thanks W98.

One question though - When moving email to the myspam folder, do you:
a) feed it all emails that are spam (including those already correctly marked as spam by SA), or
b) only feed it emails that SA did not mark as spam ?

If the preferred option is a), is there any reason why you couldn't use a mail filter to automatically move the marked spam to the myspam folder (and then only have to manually move the un-marked spam) ?


Title: Re: How-to: Train SpamAssassin
Post by: bomalley on December 05, 2005, 06:31:00 AM
I've been using this script for a few days now and no matter how many emails are in myspam or myham I get the following response.

Checking /home/XXX/mail/XXX.com/USER for spam/myham:
Learning SPAM:
Learned tokens from 0 message(s) (1 message(s) examined)
Learning HAM:
Learned tokens from 0 message(s) (1 message(s) examined)
Emptying/Creating /home/XXX/mail/XXX.com/USER/myham
Emptying/Creating /home/XXX/mail/XXX.com/USER/myspam


Title: Re: How-to: Train SpamAssassin
Post by: labyrinthian on December 13, 2005, 07:49:53 PM
Ditto here - only getting one message examined as well here for me.  Same scenario.


Title: Re: How-to: Train SpamAssassin
Post by: purefusion on January 23, 2006, 07:46:54 AM
To you who are only getting it to pick up 1 message in the folder, make sure the user prefs file contains:

required_hits   5
rewrite_subject 1
subject_tag {SPAM}
bayes_path /home/lpaccount/.spamassassin/bayes
bayes_file_mode 0600
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_ignore_header X-MailScanner-Information

Most importantly:

bayes_path /home/lpaccount/.spamassassin/bayes
(where lpaccount is your username!)

Chances are, you had a typo in this line somewhere if it is already in the user_prefs file. Search the line high and low for spelling errors, typos, and dyslexia effects)

Also, make sure you change the permissions, so that the .spamassassin folder, and all files within are chmodded to 777. Voila! Problems solved... I hope!


Title: Re: How-to: Train SpamAssassin
Post by: purefusion on January 23, 2006, 07:49:11 AM
Oh, and backup your user prefs file periodically, and DON'T use the Spamassassin configuration links within Cpanel to edit the SA prefs file, edit it directly with the file manager in Cpanel. (File: .spamassassin/user_prefs)


Title: Re: How-to: Train SpamAssassin
Post by: w98 on January 26, 2006, 11:12:27 AM
This is a great resource for SA !  Thanks W98.

My pleasure! Sorry for my silence lately, and not watching this thread more closely.

One question though - When moving email to the myspam folder, do you:
a) feed it all emails that are spam (including those already correctly marked as spam by SA), or
b) only feed it emails that SA did not mark as spam ?
If the preferred option is a), is there any reason why you couldn't use a mail filter to automatically move the marked spam to the myspam folder (and then only have to manually move the un-marked spam) ?

I move all messages, including those already flagged as spam into the spam folder. I do this for a few reasons:

1. SpamAssassin has a time-limit on what it's learned from you. These 'artifacts' that it learns (ie: keywords, email addresses it came from or were addressed to, subject lines, etc.) all expire at some point. Every time SpamAssassin runs on a message that comes in, it decides whether to expire old artifacts or not. I *believe* the threshold is 3 to 6 months, but don't quote me on that. So, by retraining, you're ensuring that current messages are indeed being flagged as spam.

2. Some spam messages are ranked lower. For example, you may notice a line including BAYES_80 in the determination of spammyness. So even though it's flagged as spam, SpamAssassin can only determine with 80%-90% assurity that it is indeed spam. By training SpamAssassin on the message again, it'll bump up that value because it will relearn from artificats.

3. Finally: http://www.w98.us/spam/
I started counting how many spam messages I get. I don't know why,maybe just to excert a little extra geekiness for my web site. I have a script that checks my spam mailbox every 15 minutes, and inserts unique message-ids into a MySQL table including the spam score from SpamAssassin. Eventually I'll add a second line to the chart showing the daily average spam score, and not just the number of spam messages I get. I'm also considering, since I have a number of domains parked here at LP, tracking which domains the spams are for.

Again, sorry for my silence lately, I'll try to reply to these more frequently in the future.


Title: Re: How-to: Train SpamAssassin
Post by: w98 on January 26, 2006, 11:31:43 AM
Oh, and backup your user prefs file periodically, and DON'T use the Spamassassin configuration links within Cpanel to edit the SA prefs file, edit it directly with the file manager in Cpanel. (File: .spamassassin/user_prefs)

Thanks, purefission, great advice.

A few other things to keep in mind with SpamAssassin, and sa-learn.cgi:

1. It's not perfect. SpamAssassin will report false-positives because of third-party problems. I recently joined a Yahoo group and someone in the group sent out a 'reminder' to everyone on the list of an upcoming event, and the advertisement that Yahoo put in the HTML-based Email was a URL that was blacklisted on 4 different blacklists. So, despite training SpamAssassin, and seeing a BAYES_00 score, *and* seeing another -6 points for an automatic whitelist, it still exceeded my spam threshold score of 3.5 and got flagged as spam.

2. I recommend everyone drop their spam threshold to 3.5 points in user_prefs. Spammers are getting more clever and finding ways around SpamAssassin's ruleset. Dropping your threshold *can* create more false-positives (reporting a message 'positive' for spam when it's really not). BE SURE TO CHECK YOUR SPAM FOLDERS on a regular basis *before* running the sa-learn.cgi script to make sure that you're not accidentally training SpamAssassin that a friend's Email is spam - otherwise future Emails from your friend will get scored higher.

3. This script isn't perfect either. There are a few extra security precautions that I'll likely build into this script at some point, such as checking for a password before proceeding. This will help alleviate any issues with someone figuring out you're an LP customer using this script and running it in a way that clears your ham/spam folders before you have a chance to review them.

4. I *always* run my scripts with xham=0 and xspam=0 before running them as =1. This is because the sa-learn script isn't perfect when reading mailboxes. It sometimes treats messages with file attachments poorly, and may report, for example, that you only have 50 messages in a mailbox when you can see in your Email client that there are 100 or more messages. If I copy 100 messages in and the sa-learn reports some other number, I run the script again with xham/xspam set at 0 to rescan a second time. Usually, after the second scan, the correct number of messages shows up. The other way to avoid this is to remove file attachments from messages if possible, so you're only scanning the message itself.

5. SpamAssassin can be slow ... be patient, and when in doubt, scan smaller amounts of messages. I personally have a 'global' set of spam/ham folders that my script cannot see. I move all messages from each of my mailboxes' ham/spam folders into those, and gradually put messages back in chunks of 50 or 100 messages, or if I don't scan daily, I may only copy messages from a single day back into the folder, then run the sa-learn.cgi script.

6. I may rewrite this script and my spam-counter into a single application if there is enough interest from the rest of you. It's purely for "look at me I'm a geek" purposes, but it's also a good reason to get other people to host with great companies like LunarPages when I can tell my friends and clients "Yeah, I get about 100-150 legitimate Emails per day, and if I wasn't hosting with LunarPages, I'd have gotten 200+ spam messages as well."

7. Kudos to the LP crew for making SpamAssassin available to us and for letting us use our geekiness to run sa-learn from a Perl CGI script to keep spam out of our mailboxes! Any plans to ever let us write our own custom spam rules? :wink:


Title: Re: How-to: Train SpamAssassin
Post by: w98 on February 03, 2006, 02:59:21 PM
Update Feb 3 2006: You can add a field of _HITS_ into the subject rewrite string, so I added a note about that in the original message.

subject_tag {SPAM  _HITS_}

Now, messages come in with subject like "{SPAM 41.2} Cheap drugs" or "{SPAM 17.1} free gift card from some store"


Title: Re: How-to: Train SpamAssassin
Post by: MontrealPaul on March 10, 2006, 12:52:41 PM
<clip>Chances are, you had a typo in this line somewhere<clip>

Also, make sure you change the permissions, so that the .spamassassin folder, and all files within are chmodded to 777 <clip>

Chmodding to 777 seems to me not only unnecessary, but also dangerous, as it would allow anyone (i.e. "world") to simply delete all your bayes stuff, and more. IMO (but do correct me if I'm wrong), all that's necessary is 755 for the cgi script, and no more than 711 for the files in .spamassassin; probably less, perhaps 600.

I, too, am having the same problem as bomalley, i.e.
"Learned tokens from 0 message(s) (1 message(s) examined)", etc.

I have tried with 0, 5, and 200 emails in the folders with the same results.

For the sake of testing, I have cut-and-pasted all text, both for the script and prefs, and carefully modified only my personal info, i.e. account, domain, etc., and even set everything for 777 for a few tests, but still the same results. I've read and re-read everything multiple times, even flipping overlaid windows to confirm there are no typos.

I have tried different e-mail accounts, some with and some without anything in the ham/spam directories, and the only difference is that for those with contents in the ham/spam directories show "1 message(s) examined", and those without show 0.

I only came upon this EXCELLENT thread today, and I have high hopes for it, but not having seen what the script was like before (I see that Ian has been making the changes "live"), I don't know what may have changed since it was working properly.

Update, March 29: Can't anyone comment on this "Learned tokens from 0 message(s) (1 message(s) examined)" problem??  :?:

Salutations,
  -Paul


Title: Re: How-to: Train SpamAssassin
Post by: w98 on March 30, 2006, 12:16:20 PM
Okay, let's take this one at a time:

1. For starters, it was another user that suggested a 777 permission, but only for the files inside your /home/ftpusername/.spamassassin/ folder, not the folder itself. My folder has permissions of 700 set, and all files within are 600 except for lock files that the system makes when training which are 666.

2. If you're getting 0/1 messages examined then the script is not seeing the spam folders in the right place. By default, this script would make a spam/ham folder for each Email account, so you may need to 'subscribe' to the new spam/ham folders for where to put the messages. If you have the global 'spam box' enabled through cpanel, that just makes a single spam box for the whole domain.

3. By all means, contact me via private message if you need assistance, I'm always happy to help.


Title: Re: How-to: Train SpamAssassin
Post by: MontrealPaul on March 30, 2006, 01:11:01 PM
Thanks for replying, w98, and for your kind offer to help via PM. I'll try to keep it "public" for now, so others may benefit.

I was going to wait for the results of my latest findings, but I'll give y'all a preview: I modded the script to display errors (by appending "2>&1" and -D to the sa-learn command), and the following message was among the copious output:
DB_File module not installed, cannot use bayes

I searched the forum and found a few references (back in 2004) to the DB_File module, and in all cases the good LP folks loaded the module (it's supposed to be there, but it seems that when new servers are implemented, as in my case, it's not loaded by default). So, I wrote them (yesterday) to ask them to load the module. The dispatch said he'd escalate it, and get back to me. I'm still waiting...

Regarding 777: I'd consider this 'dangerous', as it would allow anyone to do some serious damage, but I'll first wait and see if the above solution helps. (As noted previously, I did try the 777 approach, but with the same results).

I know that the script is "seeing the right place", because other operations, like cleanup, work fine. Also, if I empty the directory, I get a 0/0 message, and get a 0/1 message irrespective of whether it has 1 or 100 messages.

BTW, rather than create and check separate myham/myspam directories for each account (which I did initially), I do the following:

I created a separate account (lets call it "sa"), with myham/myspam directories, and created a filter to send all detected spam to this account; the filter is:
$h_X-Spam-Status: begins "Yes" >send to "sa".

Then, I can just process that account. Seeing lots of messages together with the same subject and/or sender helps me confirm its spaminess. I still, however, have to go to the separate accounts to see if any spam slipped through, and to harvest ham.

Note that this approach is good for me only because:
1) e-mail to my "clients" is actually forwarded to their other e-mail accounts (i.e. hotmail, etc); they do not log onto my domain (i.e. via Horde or Squirrel) to get their mail. The hosted accounts are only to 'catch' the e-mail so I can do the SPAM admin.
2) I only have about a dozen "clients"
3) Mine is a simple, non-profit org, and there is no confidential info.

FWIW. Just thought someone might be interested...

Salutations,
  -Paul


Title: Re: How-to: Train SpamAssassin
Post by: MontrealPaul on April 02, 2006, 07:01:34 AM
Success!! Looks like the "DB_File module" was the problem. LP installed it, and everything now works as it should - no more 0/1 errors, and my .spamassassin files are now updating (I can see the dates and file sizes changing).

I whittled down the rights to the files in .spamassassin - and could go no lower than 600 (-rw-------) without hindering my own rights - and still the sa-train program worked fine. This is actually a little surprising, and will warrant a closer inspection from the security perspective, but in any case shows that 777 is not necessary (at least, not in my case).

Thanks, w98, for writing this great script!

Salutations,
  -Paul

P.S.: If at first you don't succeed....
The first time LP got back to me, saying the DB_File module had been installed (three days after my request, but I think maybe they might do these sort of things on weekends), it still didn't work. I wrote back, suggesting they check that the service/daemon/module had been actually started. Just six hours later, they wrote back, saying that they had re-installed, and re-started the server, and then everything was hunky-dory! :yey:


Title: Re: How-to: Train SpamAssassin
Post by: w98 on April 07, 2006, 09:20:43 PM
Hi Paul, I'm happy you got it working for your domain and mailboxes. :yey:

Remember to train it with a minimum of 200 spam AND ham messages, and it'll be FAR more effective for you. On average, training it about once a week should be more than sufficient, and only takes a few minutes.

I'm actually going to put this script over on my open source site this weekend, at http://www.iandouglas.com/ where I'll be putting up all kinds of scripts and goodies I've written along the way, most of which I use here at LP on w98.us, such as a stock tracker, a spam tracker (which makes a lovely chart showing how many messages SpamAssassin as kept out of my Inbox: http://www.w98.us/spam/), and others. I've got Mantis running over there for bug tracking and feature requests, as well as discussion forums I'll be setting up for each project.

I've also got a SpamAssassin training script that is more suited to those of us who have only a handful of mailboxes that they all check themselves ... it's a throwback to an early edition of THIS training script where all mailboxes share a single ham/spam mailbox pair. And based on the great feedback I've had from people here in this thread of conversation (which I'll continue to support by all means), I'll be writing large amounts of documentation over why training SpamAssassin is such a good idea (and in fact necessary), and other tweaks etc.

It won't cover nearly as much detail as I'd hope since LP (still) doesn't let us write our own SpamAssassin rules lol

ian


Title: Re: How-to: Train SpamAssassin
Post by: w98 on April 10, 2006, 04:45:05 PM
BTW, thanks to MontrealPaul for Emailing me his patched version - if anyone has made any revisions to this code or configuration, I'd love to hear from you ... lunarforums@w98.us ... just attach your modified script to an Email and send it on, I'll try to make the modifications a configurable option for users, and give you full credit for it in the next release from my web site.

This weekend turned out to be a busy one, so I'm a little delayed on getting my new article written, let alone getting the software set for download, etc. but I hope to get everything wrapped up this week, depending on how many new patched versions you guys send me..


Title: Re: How-to: Train SpamAssassin
Post by: WithoutAPaddle on April 23, 2006, 08:58:44 PM
Hi!  I'm just trying to figure out how to do this.  A few questions.  At first I couldn't find a directory .spamassassin .  I had to activate Spam Assassin in my cpanel first.  Then when I looked in that directory there weren't 3 files like you say.  I only see two (auto-whitelist and user_prefs).  Well I figured I would move ahead anyways since there is a user_prefs atleast.  I opened user_prefs and I found these two lines.

Quote
required_score 5
rewrite_header subject _HITS_

These seem similar to what you had.  Should I use what you listed or keep these?

Quote
required_hits   5
rewrite_subject 1
subject_tag {SPAM}
bayes_path /home/lpaccount/.spamassassin/bayes
bayes_file_mode 0600
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_ignore_header X-MailScanner-Information

required_score or required_hits?


EDIT - ok it took a little while but those other files have shown up in my .spamassassin directory but there's still an auto-whitelist.  What does this do?


Title: Re: How-to: Train SpamAssassin
Post by: w98 on April 26, 2006, 11:49:28 AM
The auto-whitelist is a database of 'From' Email addresses that SpamAssassin builds when it's pretty sure the Email is non-spam. The next time that address sends a message, SA is more likely to think it's legit. And, since spammers like to change their Email address faster than a politician can flip-flop on a policital debate, I believe it only tends to flag legitimate users.


Title: Re: How-to: Train SpamAssassin
Post by: WithoutAPaddle on April 26, 2006, 09:17:59 PM
How do I know if my cgi script is working?  I don't have any emails yet because I've just set these accounts up.  Should I see actual myham and myspam folder in each email account? 

EDIT - I see now.  I had put the www in front of mydomain.com but when I removed it the myham and myspam folders were created. 

Should we be using the blacklist on Horde as well?


Title: Re: How-to: Train SpamAssassin
Post by: MontrealPaul on April 27, 2006, 06:55:29 AM
How do I know if my cgi script is working? 
<snip>
Should we be using the blacklist on Horde as well?

Normally, Spam Assassin won't really start to do its magic till you have 200 messages or so, but YMMV. Don't forget to feed both the myHam and the mySpam folders.

Horde's blacklist could supplement your spam defence, but I do not believe that SA makes direct use of it.

Salutations,
  -Paul


Title: Re: How-to: Train SpamAssassin
Post by: krick on June 01, 2006, 08:44:16 AM
Update Feb 3 2006: You can add a field of _HITS_ into the subject rewrite string, so I added a note about that in the original message.

subject_tag {SPAM  _HITS_}

Now, messages come in with subject like "{SPAM 41.2} Cheap drugs" or "{SPAM 17.1} free gift card from some store"

subject_tag has been deprecated and "_HITS_" is now "_SCORE_"

more info here...
http://wiki.apache.org/spamassassin/SubjectRewrite


Note that there are other useful tags too like _REQD_
Check out the docs...
http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf.html


This is what I use in my user prefs...


rewrite_header subject [SPAM] Score: _SCORE(0)_/_REQD_ -


The (0) after SCORE causes it to pad the scores out to at least two digits so that they sort properly.  Since I've never seen a score over 99, this works well.  You can alternatively pad with spaces instead of zeroes.

Sample subjects would look something like this....

[SPAM] Score: 07.5/5 - blah blah blah hot stock pick
[SPAM] Score: 09.1/5 - blah blah blah mortgage
[SPAM] Score: 46.6/5 - blah blah blah viagra


Title: Re: How-to: Train SpamAssassin
Post by: GMTurner on June 01, 2006, 10:40:28 AM
I get an email newsletter from PC world that consistently gets scores around 115-120... had to add it to the whitelist to let it through...


Title: Re: How-to: Train SpamAssassin
Post by: Mike McCollister on July 15, 2006, 04:46:19 AM
I have had some spam get through.  Is there a way to setup an e-mail address that I can forward spam to to SA learn from that?

Mike


Title: Re: How-to: Train SpamAssassin
Post by: w98 on August 08, 2006, 11:54:46 AM
You can but if you're only going to train it with spam, then SA isn't going to be as effective. SA needs to learn what you consider non-spam (aka 'ham') as well, which is why I took this approach of scanning two mailboxes (called 'ham' and 'spam').

If you really do just want to unbalance SpamAssassin and only teach it spam, then set up a new mailbox, and alter the Perl script (contact me via private message if you need help) to only look at that mailbox, and only with the --spam option to sa-learn.

Then, you can use IMAP to copy/move the mail into that mailbox.

Unfortunately, literally forwarding messages to the new mailbox will train spamassasin "any incoming Email from Mike's address with a subject line that starts with 'Fwd:' should be considered spam", so you'll want to use IMAP to move the messages.


Title: Re: How-to: Train SpamAssassin
Post by: imthduke on September 12, 2006, 05:05:57 AM
This might be of intrest to Spam fighters. I am using 3 antispam programs. Norton, Spam Assassin, mx logic of email defense. I have trained SA with about 2000 emails as per the instructions above. After several weeks of checking spam folder to see which one as catching spam as well as which ones are giving false positives, here is what I am concluding..........

Norton is most reliable to catch spam set to the recommended level, however because it will not allow cut and paste for white and black list, it is harder to use.

Email Defense is next effective.......a surprise to me.

And then SA is a disappointing third.

Of course this is not scientific accurate but my experience. Any other experiences?


Title: Re: How-to: Train SpamAssassin
Post by: w98 on September 12, 2006, 06:30:50 AM
I think it would be more interesting to know the numeric values of how many Emails they each caught as false positives (how many legit Emails were flagged as spam, and why) or false negatives (how many spam Emails still made it into your mailbox, and why).

Personally, I get the occasional false positive (legit Email in my spam folder) because the legit Email went through a mailing list service that was blacklisted on an RBL list... That's why I like SpamAssassin, because it can show me that "hey this message scored BAYES_00, it's not spam at all, but since it was blacklisted by 3 or 4 different RBL lists, it's total score is 10+ points". So I can use filtering to move messages that are BAYES_00 *back* into my inbox.


Title: Re: How-to: Train SpamAssassin
Post by: wkeith01 on October 08, 2006, 01:02:59 AM
Hello, I'm a 55 yr old ex-long distance lorry driver until I had an accident that nearly killed me, how very un-interesting.

The point I am making is: there are hundreds of thousands of people out there just like me who think it fantastic that they were able to make a website using a program and to be able to make the family tree, edit photo's and write letters once again, not all of us know what Imap is and so on, yet strangely enough we still get filthy spam and junk mail and we would all like to be able to stop it. I was directed here by Lunarpages Support. Although the wiz kid programming of SA is very good, can you not explain it in laymens terms like how do you get to the SA from Outlook how do you set up Imap in fact what is Imap. you have a heading called assumption which is rather patronizing,

-that you assume how to log on to cpanel using your lunarpages account details.

I think we can just about manage that

-that you know how to use Outlook, or know how to configure your e-mail based on my descriptions using outlook. yes I can even do that. You go on to say, to add an Imap account.

Now we are cooking, what is an Imap account and how do you set it up and why do you need it anyway.
 The rest of it means absolutely nothing to the layperson like myself.

Is there not some clever person out there who can actually talk old people like me through the procedure of stopping spam.

Best Wishes
wkeith01
sorry if I have offended anybody, it is not my intention.


Title: Re: How-to: Train SpamAssassin
Post by: w98 on October 11, 2006, 11:32:56 AM
wkeith01, your message raises some good points about speaking in laymen's terms.

Unfortunately, most of this message thread is technical in nature, and laymen's terms wouldn't be accurate enough. That's why I made assumptions about certain key points, so that readers would know ahead of time that they would need to understand key terminology and how to perform certain tasks. This message thread is about training SpamAssassin, not a how-to on setting up IMAP, for example.

If you want to send me a private message, or Email me at id@w98.us, I'll be more than happy to describe things a little more clearly for you, but I don't want this message thread to go way off-topic and start containing irrelevant discussions that don't have anything to do with training SpamAssassin.

Cheers,
Ian


Title: Re: How-to: Train SpamAssassin
Post by: w98 on October 12, 2006, 11:43:25 AM
It was reported to me this morning that my script does not work on some LunarPages servers. This is because LP has started implementing a new Email storage system called "maildir" where each message is stored in a separate file inside a folder, instead of all messages being stored in a single file in "mbox" format.

I'll work on a new copy of the script that will try to autodetect the storage mechanism and set it to scan individual files instead of a single file. I'm not sure yet whether this will throw off the counter or not.

Thanks, Tomas, for your suggestion.

-id


Title: Re: How-to: Train SpamAssassin
Post by: chiphayes on October 12, 2006, 05:13:27 PM
Ian (or whomever can answer this):

I've just read through this thread as, after a year of only minor spam problems, the evil kudzu of the internet is really starting to irratate me.

I use three of my domains here at LP exclusively for email , and would love to get SA working for me via your tutorial.  From you latest post, it looks like I may need to wait for a new fix from you (I'm on the omicron server, I think, if that's one of those affected), but regardless, before I attempted your tutorial, I did have one question.

I use POP3 now from Outlook to dl my emails.  You state in your tutorial that:

"If your users use POP3 to download their mail, you'll need to teach them how to set up IMAP as well to copy ham/spam into their ham/spam folder pair (generated by the new script)."

Does this mean that, as a user, I would need to delete the account now set up for POP3 and recreate it as an IMAP account?  I.e., just delete the POP3 account, and set up Outlook to read the two new IMAP folders that your script creates, and then handle the copying/moving for SPAM/HAM as you describe?

Sorry if this sounds a bit thick, but while I grasp the difference in IMAP and POP3 protocols, I've never peered carefully under the hood of either Outlook or SA, and want to make sure I know what I'm doing before launching your script (or, as it may be, whatever new one you put out now.)

Regardless, thanks for helping the community like this...

Chip Hayes


Title: Re: How-to: Train SpamAssassin
Post by: w98 on October 24, 2006, 09:22:24 PM
No, leave the Email account there. The creation of an Email account at LunarPages (or any hosting company for that matter) has nothing to do with POP3 or IMAP. POP3 or IMAP are just ways to retrieve the messages from the server.

In a nutshell, POP3 typically connects to the server, and retrieves a copy of all new messages in one shot and then deletes them from the server. IMAP, on the other hand, connects to the server and retrieves a copy of messages one at a time as you request them, and leaves the original copy on the server. Webmail programs, like Horde or Squirrelmail, use IMAP so that copies of messages remain on the server to retrieve later via POP3 (typical usage).

But because SpamAssassin can only train on messeages you've got on the server, in your case omicron, you need to set up a way to get those messages BACK to the server if you use POP3. The easiest way to do this is add a new account setup within Outlook (or Thunderbird, etc), using the exact SAME login/password as you set up for the POP3 retrieval, but set the protocol type to IMAP instead of POP3.

This way, you'll have access to the ham/spam mailboxes that get created by the script, and send COPIES of ham to the server, and to MOVE spam messages to the server, and then let my script scan those folders.

I have a few other things to take care of, but will try to get a new script available in the near future.

-id


Title: Re: How-to: Train SpamAssassin
Post by: chiphayes on October 25, 2006, 09:07:27 AM
Ian...

Ah, the light dawns.

I get it now.  Thanks for that explanation.

I'll wait until you can get a new script up to give it a try.

Thanks!

Chip


Title: Re: How-to: Train SpamAssassin
Post by: wkeith01 on October 31, 2006, 04:02:25 PM
Thank you w98 for your offer of assistance, but will decline on this occasion. I have been receiving spam , also for several weeks my e-mail was down, I would make up an address, send an e-mail to myself and within 10 minutes that address would be down and only myself and LunarPages would know the address. I sent a really nasty support ticket to LunarPages for the attention of Rod, somebody must have read it because with in two minutes of Lunarpages receiving that request all my e-mail accounts came back on line and have not been down since. That is strange. But what is stranger, I forgot my password for the forum so had it e-mailed to me. in the e-mail was an IP address from where the e-mail came from and guess what!! it is the same IP address that is sending me spam, I have the IP address in the blocked folder and could not understand why it was in the good folder as well, from memory I believe that address had sent over 200 good e-mails and over 100 spam e-mails? so what is the point of setting up spamassasin to stop spam, when it is coming from within the hosting company that you are setting up spamassasin with.
Regards and thanks, but no thanks
WKEITH01


Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on October 31, 2006, 04:06:11 PM
Hello:
If you are seeing spam coming from a Lunarpages' server, please forward it to abuse@lunarpages.com.


Title: Re: How-to: Train SpamAssassin
Post by: kenwarren on November 20, 2006, 01:42:16 PM
Hi! I'm wondering if there's been any progress on getting an update for maildir format? I can figure out some of the changes, but I'm neither a Unix nor a Perl expert, so I don't seem to be able to completely figure this out. Here's what I've got so far...

I think all the changes are in dospam.
Code:
sub dospam
{
  my ($basepath,$config,$clearham,$clearspam) = @_ ;

  print "Checking $basepath for spam/myham:\n" ;
  if ( -e "$basepath/.myspam" ) {
    print "Learning SPAM:\n" ;
    print `$salearn --showdots -p $config --spam $basepath/.myspam/cur/` ;
  }

  if ( -e "$basepath/.myham" ) {
    print "Learning HAM:\n" ;
    print `$salearn --showdots -p $config --ham $basepath/.myham/cur/` ;
  }

  # if the flag is set to clear the ham folder, or if it just plain
  # doesn't exist, do that work here:
  if ($clearham || !(-e "$basepath/.myham" )) {
    print "Emptying $basepath/.myham\n" ;
    # note: more than this is going to be required, as this only empties a folder,
    # it doesn't create one. Not sure what...
    print `rm -f $basepath/.myham/cur/*`;
  }
  # and do the same work for a spam folder
  if ($clearspam || !(-e "$basepath/.myspam" )) {
    print "Emptying $basepath/.myspam\n" ;
    # note: more than this is going to be required, as this only empties a folder,
    # it doesn't create one. Not sure what...
    print `rm -f $basepath/.myspam/cur/*`;
  }
  print "\n\n" ;
}
Can anyone offer suggestions as to how to finish getting this working with maildir?

Thanks!

Ken Warren


Title: Re: How-to: Train SpamAssassin
Post by: w98 on November 21, 2006, 07:22:59 AM
Hi! I'm wondering if there's been any progress on getting an update for maildir format? Can anyone offer suggestions as to how to finish getting this working with maildir?

Getting it to work with maildir is the easy part ... what you've written is pretty close, but you'll want to include a * wildcard on the end:
Code:
print `$salearn --showdots -p $config --spam $basepath/.myspam/cur/*` ;
That way, it will learn from all files within the directory.

The tricky things that I need to build into the script are:
1. auto-detection of the old mbox format or this new maildir format so I don't have to maintain two different scripts, and
2. that building the subfolders the first time through will work correctly - from what I'm seeing, there's more to it than just running:
Code:
mkdir .myspam
mkdir .myspam/cur
mkdir .myspam/tmp
mkdir .myspam/new
... that there might be some files in those folders to create the first pass through. Once the folders are made, then yes, just removing the files as you've done should work just fine. But I'd probably want sa-learn to look at both 'cur' and 'new' since the maildir format may have messages in both.

I've been pretty busy lately, so I'm sorry for the delay for everyone, but I really hope to get a working script up soon.

ian


Title: Re: How-to: Train SpamAssassin
Post by: jsanderspc on November 30, 2006, 08:45:20 AM
Just starting to configure this and all seems well so far, but I have a question.

If an employee were to make a mistake and place an e-mail that was ham in the spam folder, is there a way to re-allow that sender to pass through SA?  In other words, if that sender is now identified as spam, can I undo it?

Would I simply add that senders address to the whitelist in Horde or the LP Mail Control Panel under SA?

Thanks


Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on November 30, 2006, 09:46:17 AM
I don't believe it would block it completely, you could move it back to a ham folder and have it learn as ham.


Title: Re: How-to: Train SpamAssassin
Post by: jsanderspc on November 30, 2006, 01:57:32 PM
Thanks Ryan,

Also, I have another question.  What if my users do not get into the habit of copying "good" messages into the "ham" folder?

I'm sure I can convince them to copy spam into the "spam" folder, but I'm not confident they will make use of the "ham" folder.

Could this cause an issue with false detections?

Thanks again,

Jim


Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on November 30, 2006, 01:59:59 PM
It can, you want to train for both Spam and Ham otherwise the filter gets biased towards spam and you may end up with false positives. Spam-Assassin does it's own learning if something scores high or low enough for spam and ham so it depends how you may want to do it. You can just run it on the inbox but you may catch some spam in there also.


Title: Re: How-to: Train SpamAssassin
Post by: KenTen on November 30, 2006, 08:58:48 PM
Hi,

I followed all the instructions (I think) and I'm getting this error (repeatedly) in my cpanel error log:

Quote
bayes: cannot open bayes databases /home/studio8/.spamassassin/bayes_* R/W: tie failed: Inappropriate ioctl for device
bayes: cannot open bayes databases /home/studio8/.spamassassin/bayes_* R/O: tie failed:

I asked Lunarpages to make sure the DB_File module was installed on my server and they said that it was.

No matter how many spam or ham messages I have in the imap folders I get this result in the browser:

Quote
Checking /home/studio8/mail/studioreport.com/ken for spam/myham:
Learning SPAM:
Learned tokens from 0 message(s) (1 message(s) examined)
Learning HAM:
Learned tokens from 0 message(s) (1 message(s) examined)
Emptying/Creating /home/studio8/mail/studioreport.com/ken/myham
Emptying/Creating /home/studio8/mail/studioreport.com/ken/myspam

The folders get emptied but it seems I get no result.  Here is my code:

Code:
#!/usr/bin/perl
use CGI::Carp qw(fatalsToBrowser);
print "Content-type: text/plain\n\n" ;

use CGI ;
my $q = new CGI ;

#if 'account' is set in the URL, you can specify just a single Email account to check
my $account = $q->param('account') ;
# if you want to clear the ham/spam mailboxes after scanning, set these to a non-zero value
# by default, these flags will be set on, and will remove all ham/spam
my $clearham = ($q->param('xham') ? $q->param('xham') : 1) ;
my $clearspam = ($q->param('xspam') ? $q->param('xspam') : 1) ;

# example: scan only Jim Smith's account, but don't empty the 'ham' folder
# http://www.yourdomain.com/cgi-bin/sa-learn.cgi?account=jimsmith&xham=0

# example: scan all accounts for our domain, and by omitting the xham/xspam flags,
# automatically delete all ham/spam messages after scanning:
# http://www.yourdomain.com/cgi-bin/sa-learn.cgi


# by default, look at ALL login accounts
my $emaillogin = "all" ;
if ($account) {
  $emaillogin = $account ;
}

my $basepath = "/home/studio8/mail/studioreport.com" ;

my $salearn = `which sa-learn` ;
chop($salearn) ;
print `$salearn --version` ;

my $config = "/home/studio8/.spamassassin/user_prefs" ;

$| ;

if ($salearn) {
  print "using $salearn in $basepath (login: $emaillogin) to learn about spam/ham\n" ;
  if ($emaillogin ne "all") {
      # check a single account
      my $newpath = "$basepath/$emaillogin" ;
      &dospam($newpath,$config,$clearham,$clearspam) ;
  } else {
    # scan the /mail/ directory for user accounts, and then detect ham/spam folders in there
    my @logins ;
    opendir(DH,$basepath) ;
    while (my $file = readdir(DH)) {
      if (substr($file,0,1) ne ".") {
        if ( -d "$basepath/$file" ) {
          push (@logins,$file) ;
        }
      }
    }
    closedir(DH) ;
    foreach my $login (sort @logins) {
      my $newpath = "$basepath/$login" ;
      &dospam($newpath,$config,$clearham,$clearspam) ;
    }
  }
} else {
  print "Could not locate sa-learn !!" ;
}
exit ;


sub dospam
{
  my ($basepath,$config,$clearham,$clearspam) = @_ ;

  print "Checking $basepath for spam/myham:\n" ;
  if ( -e "$basepath/myspam" ) {
    print "Learning SPAM:\n" ;
    print `$salearn --showdots -p $config --mbox --spam $basepath/myspam` ;
  }

  if ( -e "$basepath/myham" ) {
    print "Learning HAM:\n" ;
    print `$salearn --showdots -p $config --mbox --ham $basepath/myham` ;
  }

  # if the flag is set to clear the ham folder, or if it just plain doesn't exist, do that work here:
  if ($clearham || !(-e "$basepath/myham" )) {
    print "Emptying/Creating $basepath/myham\n" ;
    open (HAM, "> $basepath/myham") ;
    print HAM "" ;
    close HAM ;
  }
  # and do the same work for a spam folder
  if ($clearspam || !(-e "$basepath/myspam" )) {
    print "Emptying/Creating $basepath/myspam\n" ;
    open (SPAM, "> $basepath/myspam") ;
    print SPAM "" ;
    close SPAM ;
  }
  print "\n\n" ;
}

#eof ;

Any help would greatly be appreciated.  I'm lost.

Thanks,
Ken


Title: Re: How-to: Train SpamAssassin
Post by: chiquita9896 on December 04, 2006, 01:48:43 PM
Hello all,

Hope you can help me out with this one.  I have 'enabled' spam assasin from the control panel, but nothing appears to be happening.  Nothing appears in the Spam box, I've been checking it on and off for the last couple of months.

Is there more to this process than meets the eye? 

PS . Not overly technical, so any replies in laymans terms only please.




 :-?


Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on December 04, 2006, 01:55:42 PM
Hello:
If you check the headers of the e-mail (depending on which e-mail client you are using, you may need to click on an option to view all headers), there should be an entry in there for SpamAssassin. One of these entries will look like:
X-Spam-Checker-Version:
X-Spam-Status:

If you do see those, your e-mail is being sent through SpamAssassin. You may need to setup your local e-mail client to junk any e-mails that are considered spam by SpamAssassin (you can do this in Thunderbird for example under the Spam Filter option).

Enabling SpamAssassin alone won't send spam to the global Spambox, You will also need to enable the Spam Box option in Cpanel for that function to work.

let us know if you have any additional questions.


Title: Re: How-to: Train SpamAssassin
Post by: Thomas52 on December 05, 2006, 01:02:08 PM
running the "http://www.adkins9.net/cgi-bin/sa-learn.cgi" command, I get an error message:
"Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.
--------------------------------------------------------------------------------
Apache/1.3.37 Server at www.adkins9.net Port 80"
What have I done wrong?  Any help appreciated.


Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on December 05, 2006, 01:25:19 PM
Check to make sure that the sa-learn.cgi file is in the correct location. The error you are receiving is a 404 not found error which means the file could not be found.


Title: Re: How-to: Train SpamAssassin
Post by: Thomas52 on December 05, 2006, 01:31:44 PM
I'm showing it at:
/public_html/cgi-bin/sa-learn.cgi
Help?


Title: Re: How-to: Train SpamAssassin
Post by: Thomas52 on December 05, 2006, 01:33:58 PM
It's showing:
"/home/adkins92/public_html/cgi-bin/sa-learn.cgi File Saved "
I assume this is correct?


Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on December 05, 2006, 01:41:04 PM
Okay, It looks like it's a 500 error rather than a 404 error. make sure that the file is chmodded correctly (755).


Title: Re: How-to: Train SpamAssassin
Post by: Thomas52 on December 05, 2006, 01:44:35 PM
Bingo.  Thank you for putting up with novices. :clap:


Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on December 05, 2006, 02:00:13 PM
Glad to hear it's working for you and it's no problem. If you have any additional questions please don't hesitate to ask.


Title: Re: How-to: Train SpamAssassin
Post by: KenTen on December 06, 2006, 03:23:47 PM
OK I fixed the 0-1 problem described in my prior post by deleting my old bayes_toks and bayes_seen files because they were apparently the wrong version.  Training seems to be working well.  Thanks, W98 for this thread,

Maybe I can get an answer to this one.  I don't lke using the IMAP email scheme.  I normally POP several accounts into one inbox.  Does it make a difference if I just copy all my spam and all my ham into one spam or ham folder or do I have to copy it into email address-specific folders?  For instance, if I have spam from ken, kenhome, info, webmaster, etc., can I just copy it all into my ken account's spam folder for analysis and get the same effectiveness in my training?

Thanks in advance for any help or consideration,
Ken


Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on December 06, 2006, 03:59:32 PM
Hello:
You just need to copy it to one spam and one ham folder and it will apply to all of your e-mail addresses.


Title: Re: How-to: Train SpamAssassin
Post by: Rapunzl on December 09, 2006, 02:30:44 PM
It looks as if nothing is being learned here. I copied and pasted your script and prefs into my folder and just get this:

Quote
Checking /home/username/mail/domain.com/webmaster for spam/myham:
Learning SPAM:
Learned tokens from 0 message(s) (0 message(s) examined)
Learning HAM:
Learned tokens from 0 message(s) (0 message(s) examined)
Emptying/Creating /home/username/mail/domain.com/webmaster/myham
Emptying/Creating /home/username/mail/domain.com/webmaster/myspam

Of course the username and domain are my specific usernames and domain.

Is this related to the change to separate mail files, in which case I just need to wait for the new script or am I doing something wrong?

Also, how do I set up the script for addon domains and subdomains. They aren't scanning at all right now.


Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on December 09, 2006, 02:41:49 PM
Hello:
That can be the cause of the new mail format if you are on a server using the maildir format.


Title: Re: How-to: Train SpamAssassin
Post by: Rapunzl on December 09, 2006, 10:03:41 PM
That's what I thought. Does that mean I need to change servers to get it to work? Is there a workaround?

Also, I have an addon domain it's not picking up. Is this part of the same problem?


Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on December 10, 2006, 01:31:59 PM
We don't allow server moves from maildir to the mbox format (because eventually you're going to get converted again).

You can run these scripts that I posted to run sa-learn on the spam folder:
http://www.lunarforums.com/forum/index.php?topic=36589.msg274304#msg274304

It just runs sa-learn and does not clear out the mailbox.

The script originally posted only checks one inbox, you would need to change the script or make multiple scripts and have it check each folder.


Title: Re: How-to: Train SpamAssassin
Post by: w98 on December 12, 2006, 06:12:24 PM
That's what I thought. Does that mean I need to change servers to get it to work? Is there a workaround?
Also, I have an addon domain it's not picking up. Is this part of the same problem?

To track your add-on domain's Email, you'll need to tweak the script to look at multiple domains. Right now, it looks at a single domain.

As for a maildir workaround, I promise, I'm getting around to it :o) I'm a full-time independent freelance worked with more work to do than I have hours in the day, but have two weeks of vacation time coming up. I could use a guinea pig / volunteer to let me test the new script on their maildir setup ... any takers?


Title: Re: How-to: Train SpamAssassin
Post by: Thomas52 on December 21, 2006, 09:07:53 AM
Okay, I've turned Spamassassin on (no 'box,') installed the 'sa-learn,' got a couple of hundred 'myham' & myspam' in the folders, and have run the 'sa-learn' file.  I've set Spamassassin at 5.0.  I've got four primary email addresses (each with a different 'myham & myspam box,) using three different domains (or subdomains.) Only one is really a problem.
I've looked at many of the discussions, but it's still not clear to me. I'm still getting a ton of trash, and placed a number of addresses on whitelist and/or blacklist. I've not set any 'filters' because I'm not clear with them. Can you give me a "Filters for Dummies?," or whatever else it is that I'm doing wrong or not doing right?
I've looked a galana and am not sure that's the best solution.  Does it allow me to 'approve' cetain 'unverified' addresses, or do they go to email never-neverland?
:(


Title: Re: How-to: Train SpamAssassin
Post by: Rapunzl on December 27, 2006, 05:27:52 PM
Ryan:

I tried your script and still getting:

Quote
Learning Spam... Learned tokens from 0 message(s) (0 message(s) examined) Learning Spam... Learned tokens from 0 message(s) (0 message(s) examined) Learning Spam... Learned tokens from 0 message(s) (0 message(s) examined) Learning Spam... Learned tokens from 0 message(s) (0 message(s) examined) Learning Spam... Learned tokens from 0 message(s) (0 message(s) examined) Learning Spam... Learned tokens from 0 message(s) (0 message(s) examined) ?>

I'm thinking I missed something in class? I changed my user_prefs to the cgi stuff in http://www.lunarforums.com/forum/viewtopic.php?t=13958. Do I need to go back to the default and set the prefs from there? I haven't seen the same problem from anyone else, so I'm thinking it's something I did wrong.

 :?


w98: I totally understand. I'm working 12+ hours a day, 6 days a week myself. I'm excited to see what you come up with since your original seemed to work so well, but I'll try to get this working myself until then. 
:smiling:.


Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on December 27, 2006, 05:30:22 PM
Are the folder paths matching correctly on your account? The Spam folder on my e-mail addresses are Junk but on some accounts it may be Spam or something else.


Title: Re: How-to: Train SpamAssassin
Post by: Rapunzl on December 27, 2006, 05:39:15 PM
My main account is {Account} with subfolder "inbox" with sub subfolder "spam":

passthru('/usr/bin/sa-learn --showdots -p /home/{Username}/.spamassassin/user_prefs --mbox --spam /home/{username}/mail/rapunzl.com/{account}/spam');

didn't work, so i tried:

passthru('/usr/bin/sa-learn --showdots -p /home/{Username}/.spamassassin/user_prefs --mbox --spam /home/{username}/mail/rapunzl.com/{account}/inbox/spam');

same results  :-?





Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on December 27, 2006, 05:41:37 PM
Do you know if you're on a server with the mbox or the maildir format? If the later you would need to remove the --mbox part of the command.


Title: Re: How-to: Train SpamAssassin
Post by: Rapunzl on December 27, 2006, 05:59:50 PM
Do you know if you're on a server with the mbox or the maildir format? If the later you would need to remove the --mbox part of the command.

I tried taking it out... still nothing. I looked on my FTP to see the directory of my mail folder, and it's different than the imap structure in Outlook Express:

mail/rapunzl.com/ACCOUNT/
and its subfiles include inbox, myham, myspam, spam

In other words, I haven't a clue. I appreciate you being patient with me.


Title: Re: How-to: Train SpamAssassin
Post by: Paul D on December 28, 2006, 10:55:24 AM
Hey all, new forum member here.  I am not a website maintenance person or a tech wizard, just a guy with a couple of email accounts which like most get a lot of spam.  I had Edefense enabled, which worked great but seemed to hang up the account periodically, so at the suggestion of LP techs I disabled it and enabled Spam Assassin.  It doesn't work very well, though, especially against image spam, so I checked out this thread and it seems promising but intimidating to a relative non-tech guy like me.  So here's my question:  the thread started back in '04.  Are these techniques effective against the latest flood of image spam?  If so, I might dive in.  TIA for any and all feedback.

Edited later that day:  OK, I did it and it seems to be working OK so far.  Thanks!  One more question and I'm sure this is obvious to everybody but me:  why is it necessary to move the messages to the myham & myspam folders?  Why can't SA be trained to learn from the "inbox" and "spam" folders?


Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on January 07, 2007, 06:11:20 PM
Rapunzl,
What is the exact code you are using?

Paul D,
They don't. You can change the folder name in the learning paths to what you want them to be, in your case the Spam and inbox folders.


Title: Re: How-to: Train SpamAssassin
Post by: w98 on February 05, 2007, 01:28:17 PM
Paul D,
Keep in mind that my training script will erase any mailboxes you tell it to look at.

Running sa-learn using a PHP page with exec_passthru() will let you point to any folders you like, however if you have mailboxes in Mbox format, your mailbox may still contain the spam messages you moved, so you may need to expunge those messages before running the trainer. If you use Thunderbird, just right click on the folder and say 'Compact This Folder'.

For those in Maildir format, I'll definitely have the maildir version this week - LP converted the server I host on to Maildir and I'm already seeing spam creep in. If you're calling sa-train directly and having trouble with pathing, keep in mind that the spam/ham mailboxes may have a dot in front of them, like:

/home/(account)/mail/.scan-spam
/home/(account)/mail/.scan-ham

or in the older version of my script:

/home/(account)/mail/mydomain.com/myemail/.spam
/home/(account)/mail/mydomain.com/myemail/.ham



Title: Re: How-to: Train SpamAssassin
Post by: soarmi2 on February 16, 2007, 04:13:20 PM
I am getting the following message when I run the script after configuring everything. What am I doing wrong? The folders aren't being emptied either.

SpamAssassin version 3.1.7
using /usr/bin/sa-learn in /home/soarmi2/mail/soarministries.org (login: soarmi2) to learn about spam/ham
Checking /home/soarmi2/mail/soarministries.org/soarmi2 for spam/myham:
Emptying/Creating /home/soarmi2/mail/soarministries.org/soarmi2/myham
Emptying/Creating /home/soarmi2/mail/soarministries.org/soarmi2/myspam



Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on February 17, 2007, 06:49:03 AM
Do you know what server you are on and if it's still on the old mbox format or on the new maildir format? Also check to make sure the paths to the specific myham/myspam boxes are correct.


Title: Re: How-to: Train SpamAssassin
Post by: Mike Fleming on February 22, 2007, 02:53:34 PM

I have a question about having SpamAssassin learn from spam that was already caught by the program and sent to my spam box.  Since SA replaces the body of these emails with text of its own, including the breakdown of scoring, is there a chance of the program learning incorrect tokens from these emails? Would it be better to only give it spams that got by?  Or will it be able to learn correct tokens from these emails?

Thanks,

Mike


Title: Re: How-to: Train SpamAssassin
Post by: rvicker on February 26, 2007, 06:05:39 PM
Ian,

Earlier you stated that forwarding to special email addresses rather than using IMAP would not work because of the "forwarding information".

However, a link on the spamassassin site to http://gtmp.org/pub/sa-postfix.en.html details a setup with postfix to to exactly that.

We don't have the access the article uses but can't something similar be done on lunarpages servers?

I don't want to use IMAP and I think I really need to improve spamassassin's ratings because a lot more obvious spam is getting through lately with negative scores and autolearn=ham tags.


Title: Re: How-to: Train SpamAssassin
Post by: Lupine1647 on February 27, 2007, 02:05:00 PM
Hello:

You would be able to set something up similar to that on our VPS and dedicated solutions as you would have access to those configuration files.


Title: Re: How-to: Train SpamAssassin
Post by: spatters1000 on March 21, 2007, 05:37:07 PM
My Lunarpages mail server was recently converted to maildir format and all my spam and ham folders have disappeared. Also, I used to log in to the admin account and, using Horde, would clean out the spam folder that SA had dumped all the spam into.

Now that the changes have been made I no longer see any of the old spam folders, etc. I'm at a complete loss as to what I need to do now to ensure that SA continues to work properly.  :-?


Title: Re: How-to: Train SpamAssassin
Post by: dugawug on March 22, 2007, 01:57:57 PM
just seeing this post now...are all the instructions on the first post still as valid as they were several years ago?   :D

i think i need some extra help from SA


Title: Re: How-to: Train SpamAssassin
Post by: w98 on March 28, 2007, 10:01:39 AM
is there a chance of the program learning incorrect tokens from these emails? Would it be better to only give it spams that got by?  Or will it be able to learn correct tokens from these emails?
Hi Mike,

SA will ignore its own tokens, and only re-evaluate the message itself. This is useful in cases where a message might have scored as BAYES_50 (50/50 chance of being spam in its opinion), be re-training SA with the message, it will lean more on the side of "this is spam" the next time it sees a similar message.


Title: Re: How-to: Train SpamAssassin
Post by: w98 on March 28, 2007, 10:13:58 AM
We don't have the access the article uses but can't something similar be done on lunarpages servers?
What I understood from the URL you posted was that the user was simply forwarding a message directly to sa-learn without stripping the forwarding information. I would worry about sending false-negatives (spam that made it to your Inbox) to sa-learn since it's a little unclear as to how sa-learn will determine the tokens from the message as to their spam-iness. This would certainly work for reporting false-positives (non-spam that were placed in the spam folder), since we know for sure that SA will ignore its own headers and tokens.

I don't want to use IMAP and I think I really need to improve spamassassin's ratings because a lot more obvious spam is getting through lately with negative scores and autolearn=ham tags.
Well, there's nothing bad about setting up a separate IMAP profile in Outlook Express or Thunderbird or whatever mail client you use -- and since doing this is the easiest way to train SA, I'm curious why you're unable/unwilling to set up IAMP.

I'll check into SA a little more, see if there's an alternate way around this.


Title: Re: How-to: Train SpamAssassin
Post by: w98 on March 28, 2007, 10:18:44 AM
My Lunarpages mail server was recently converted to maildir format and all my spam and ham folders have disappeared. Also, I used to log in to the admin account and, using Horde, would clean out the spam folder that SA had dumped all the spam into.
In Maildir format, the 'admin' account you logged into with your FTP details will no longer have access to the individual users' accounts to access their separate spam/ham folders. This is one of the consequences to using Maildir, but the benefits of Maildir seem to outweigh this inconvenience.

I'm actually working on a separate Perl script which, if successful, I can merge into the SA training script that will move all spam/ham contents from the individual user accounts into a main spam/ham folder pair and learn from that. I should have an updated script pretty soon which I will post back on page 1.


Title: Re: How-to: Train SpamAssassin
Post by: w98 on March 28, 2007, 10:21:09 AM
are all the instructions on the first post still as valid as they were several years ago?
They should be, provided that your server is saving Email in the older Mbox format.

I'll be updating the SA learning script soon to allow it to work with the Maildir format that the servers are being converted to; the only thing I seem unable to do with the training script is empty the folders once the messages are scanned. I want to enhance the script to detect Mbox vs Maildir format as well to see if that can alleviate any problems.


Title: Re: How-to: Train SpamAssassin
Post by: spatters1000 on March 29, 2007, 03:07:48 PM
I'll be updating the SA learning script soon . . .

To w98 -- Just in case no one has said "Thank You" lately, I want to so you'll know those of us who use Lunarpages -- and who aren't as programmatically gifted as you  --truly appreciate what you are doing in supporting Spam Assassin. Your efforts do not go un-noticed! :clap:


Title: Re: How-to: Train SpamAssassin
Post by: w98 on March 30, 2007, 12:17:08 PM
Thanks Spatters! Much appreicated!

I'm currently rewriting the whole tutorial and script stuff as an article series at a non-LP hosted domain. I'll need to contact LP to see if it's okay to post that URL as message #1 in this thread to direct users over to my other site for the expanded version of the instructions. I have 4 technical reviewers checking out what I've redone so far, to see where I need to clarify instructions, etc., and I've based much of the new text on the 10 pages (so far) of followup messages from this thread.

I also need to rewrite parts of the training script, which will actually split into three separate scripts to choose from, two for Mbox users, and one for Maildir users.


Title: Re: How-to: Train SpamAssassin
Post by: w98 on March 30, 2007, 02:51:03 PM
Update, March 30, 2007, 4pm (give or take)

I'm working on a new set of instructions, and after writing a LOT of text about detecting your own Mbox or Maildir setup, decided 'what the heck' and just write some logic into the script to detect which mail format you currently have mail stored under instead of making you jump through hopes tofigure it out.

Part of the configuration at the top of the new script will require you to choose to scan for individual spam/ham folders, or to use a global spam/ham folder pair. Also, I'm likely to completely do away with emptying the folders after scanning, since it can cause too much confusion and result in data loss that I don't want to be liable for... :shades:

Keep your eye on message #1 in this tread for more details soon! :happy:

note: because the new instructions are HUGE (many many pages long), I'm actually going to host the instructions and script at my web site. I've already received permission from LunarPages to redirect people from this message thread to my tech site to find the latest updates. I should have it all wrapped up around Easter. Stay tuned! :thumb:


Title: Re: How-to: Train SpamAssassin
Post by: telling on March 31, 2007, 06:47:53 PM
Hi - I've enabled Spam Assassin and Spam Box. I have two folders in my main account mail directory called myspam and myham. I would like to check the spam periodically and move to ham if necessary, but nothing gets sent to myspam. It seems like I've missed a step somewhere. Is it in the filters? If so, how do I tell it to send the spam to myspam?


Title: Re: How-to: Train SpamAssassin
Post by: w98 on March 31, 2007, 11:43:02 PM
Hi Telling,
In your IMAP setup in Step 1, you also have to subscribe to the 'spam' folder that CPanel creates when spam shows up for your Email accounts. Then, you will need to manually move messages from the 'spam' folder into the 'myspam' folder for scanning.

This was on purpose, in case you had users who never cleared out their spam folder, otherwise they would cause your training script to slow down significantly every time you ran it. By making your users move spam into a new folder, you alleviate that problem, and also ensure that your users are not accidentally scanning false-positives (ham messages that accidentally got flagged as spam) as actual spam.


Title: Re: How-to: Train SpamAssassin
Post by: w98 on April 01, 2007, 12:42:02 AM
Just wanted to give everyone an update on the new script, which I'm now calling sa-training.cgi, version 3.0a:

484 lines and counting, it's a beast ;o) There are about 100 lines of comments in the configuration section though, and you'll have a lot of new choices for running the script:

You can pick one of these three options for scanning HAM:
- have your users forward ham messages to a single mailbox (that way they can't all read each other's ham messages by having them all subscribe to a global IMAP folder)
- have your users' Inboxes scanned for ham instead of a separate 'ham' folder, or specify the name of a 'ham' folder (ie: "myham" or "scan-ham" etc ** this is the preferred method)
- have a global mailbox folder where you manually move all of your ham messages

You can pick one of these two options for scanning SPAM:
- have a global mailbox where you manually move all of your spam messages
- have your users' "spam" folders (as created by CPanel) scanned for spam, or specify an alternate spam mailbox name that each user must have set up (ie: "myspam" or "scan-spam" ** this is the preferred method)

This new flexibility means you could have all of your users forward ham messages to globalham@yourdomain.com, but still have the script check their individual spambox for spam scanning. Or maybe you have some nifty filtering set up where all spam on your domain dumps into one big spambox, but you still want your users to have their privacy about their hambox. Etc.

Other new features and changes:
- automatic Mbox / Maildir detection :9: :yey: :clap:
- support for scanning mailboxes for add-on domains  :happy:
- customized path for SpamAssassin's "sa-learn" utility if my script cannot autodetect it
- customized path for base mail installation, in case someday LP changes "/home/username/mail" to "/home/username/Maildir" or whatever...
- removal of all code that deletes messages after scanning ... clean-up will be your own responsibility (and, ahem, liability :shades:)
- I'm also adding a Y/N flag for doing a callback to my site to check for a newer version of the script; those with privacy/paranoia concerns can turn this off :notme: even though i don't send any personal data whatsoever, i just download a version number from my site and compare it to the version number within the script itself... that's the beauty of open source: y'all can peek at what my script does to know I'm not doing anything shady
- for those that like statistics, I also peek into the bayesian database and display how many ham/spam messages you've scanned over time... this will help people with low Email quantities know when they've passed their 200 minimum to really get SpamAssassin sizzling.

New documentation and quick start guide is coming in the next week or so -- the new quick start guide will be posted on page one of this thread with instructions for new readers to skip over here to page 11 of the comments since everything between my quick start guide and page 11 will refer to an older, outdated version of the script.

This forum thread will still serve as the primary support area for the script, though because of the size, I may host the script on my web site for download since we're only allowed 20,000 characters per forum message (script so far is about 23kb in size).

So far, I have the new script running for myself at w98.us using a global ham/spam mailbox set called "scan-ham" and "scan-spam", and you're all welcome to take a peek: http://www.w98.us/cgi-bin/train.cgi

(that URL may become inactive at some point, or simply show static data and not actually scan messages)

I'd love a few guinea pigs, er, volunteers ... to help beta test this a little more. Send me a private message or send an Email to ian.douglas@iandouglas.com


Title: Re: How-to: Train SpamAssassin
Post by: w98 on April 01, 2007, 12:54:40 AM
Oh, and given today's date (April Fools Day), I just wanted to add a follow-up that the version of the script I just announced *is* actually the next release. No April Fools Day joke here about things like Maildir/Mbox autodetection, and all that good stuff. This is for real.

Honest.
Cross my heart.


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: w98 on April 05, 2007, 01:58:13 PM
For anyone subscribed to this forum thread, I'm replying to the thread so you'll get an update message:

I've finally gotten to a comfortable place with the software to announce version 3.02 of my script is available for download. Please check back to page 1 of this message thread for download instructions and a quick-start guide. Please note that the download link for the script will take you away from LunarForums.com, and that you should return back here to this forum thread for any help, questions, etc.

Edit: found a spelling mistake ... I broke a finger the other day, so typing is tough :cry:


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: MontrealPaul on April 05, 2007, 05:25:22 PM
For anyone subscribed to this forum thread, I'm replying to the thread so you'll get an update message...

... found a spelling mistake ... I broke a finger the other day, so typing is tough :cry:

Yo, Ian, thanks for the updates, and of course thanks for all of your hard work - it's VERY much appreciated! :beer:  Cheers, bro!

Must be doubly hard (well - 10% harder, anyway) with a broken digit! Here's to a speedy recovery!  :thumb:


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: w98 on April 05, 2007, 09:31:20 PM
... along with a keyboard that has 4 keys broken - the lower left 4 letters that follow Y,W,B,U in the alphabet, yeah, it's been an interesting week lol

thanks tho, 3 weeks with my finger taped up really suks :)


Title: Re: How-to: Train SpamAssassin
Post by: telling on April 06, 2007, 10:39:43 AM
I don't have a 'spam' folder - I have two folders, one called 'myham' and one called 'myspam' - but nothing ever gets sent to 'myspam'.

Hi Telling,
In your IMAP setup in Step 1, you also have to subscribe to the 'spam' folder that CPanel creates when spam shows up for your Email accounts. Then, you will need to manually move messages from the 'spam' folder into the 'myspam' folder for scanning.

This was on purpose, in case you had users who never cleared out their spam folder, otherwise they would cause your training script to slow down significantly every time you ran it. By making your users move spam into a new folder, you alleviate that problem, and also ensure that your users are not accidentally scanning false-positives (ham messages that accidentally got flagged as spam) as actual spam.


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: w98 on April 06, 2007, 02:07:37 PM
Hi Telling,

If you enable the 'spam box' in the SpamAssassin configuration in CPanel, it will create an individual spam box for any mail account when the first spam message comes through SpamAssassin. You can always create it on your own to begin with, but unless you have the 'spam box' enabled in CPanel, messages that get flagged as spam, I believe, just get dropped permanently without you ever seeing them.

There's no way to configure CPanel to tell it to put your incoming spam into 'myspam' -- that's a step you need to manually perform before running the trainer: move the spam from the 'spam' folder into your 'myspam' folder, then run the trainer, then delete the messages in 'myspam'.

If you DO have the 'spam box' enabled in CPanel, then you either (a) never get any spam at your Email address, or (b) there's something technically wrong with your spam box setup that LP support should be able to help you with.


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: telling on April 07, 2007, 11:15:55 AM
OK, I think I'm making progress, but slowly.

- I have a spam box, I can see it in Horde, and it has spam in it

- I have three other folders, 'ham', 'myham', 'myspam' which I can see in my file manager, but I can't see in my Horde folders

- I have a mailbox for globalham at my domain

And now I'm not sure what's supposed to happen next:

1. Do I need the myham and myspam mailboxes?

2. How do I get the ham or myham mailbox to show up in Horde?

3. Then do I manually move ham from the spam mailbox to the ham mailbox and vice-versa? Or should I be using the myham and myspam mailboxes? (where did these folders come from, anyway? did I make them at some point?)

4. When I run the cgi script, does it check the spam/ham mailboxes by default? Or the myham/myspam mailboxes?

5. Do I need to be a professional programmer to figure this out?

Thanks for your help, I appreciate it.

- Jeffrey  :-?

Hi Telling,

If you enable the 'spam box' in the SpamAssassin configuration in CPanel, it will create an individual spam box for any mail account when the first spam message comes through SpamAssassin. You can always create it on your own to begin with, but unless you have the 'spam box' enabled in CPanel, messages that get flagged as spam, I believe, just get dropped permanently without you ever seeing them.

There's no way to configure CPanel to tell it to put your incoming spam into 'myspam' -- that's a step you need to manually perform before running the trainer: move the spam from the 'spam' folder into your 'myspam' folder, then run the trainer, then delete the messages in 'myspam'.

If you DO have the 'spam box' enabled in CPanel, then you either (a) never get any spam at your Email address, or (b) there's something technically wrong with your spam box setup that LP support should be able to help you with.


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: w98 on April 07, 2007, 07:27:04 PM
Telling, and everyone else:

Looking back on the previous two weeks, I realize that all of my documentation assumes you're brand new to the sa-trainer.cgi script and have never used it before. I neglected to write up sufficient documentation for previous users of the script who are upgrading to the v3 script, so it's entirely my fault for the confusion that Telling (and others) may be dealing with. My apologies for causing any grief by upgrading to the new script, which by default acts entirely differently than the old v2 script. I've added a note at the bottom of this message where I outline thoughts for a web page that will act as a wizard/walk-through sort of interface and build you a custom copy of the script. Hopefully that will alleviate many of the hurdles you may be facing with suddenly retraining your users to use a new method to deal with ham/spam.

On to Telling's questions:

- I have three other folders, 'ham', 'myham', 'myspam' which I can see in my file manager, but I can't see in my Horde folders
Which directory path do these folders live in, and which login username (don't tell us the password!) do you use to login to Horde? I'm curious if you're logging into Horde as the same user that the file manager path would suggest?

For example, if you login to CPanel as 'telling23' but login to Horde as something like 'jeffrey@mydomain.com' then no, the spam folder you probably see under /home/telling23/mail/spam or /home/telling23/mail/spam/cur/ will not be available under the jeffrey@mydomain.com individual user account.

1. Do I need the myham and myspam mailboxes?
Since you've set up a separate Email address to forward your non-spam messages to, the new v3 script will scan that user's Inbox for ham messages, so the new v3 script will not scan them or need them. However, I suggest you keep the 'myspam' folder, and once we figure out why your spam box isn't showing up in Horde, which I think is just a difference in login details, we can address that issue.

Try this: login to Horde using the exact same username/password that you use to login to CPanel. Is that where you see the ham/myham/myspam folders? If you use a different login name for Horde than you do for CPanel, then we'll just need to copy your spam messages elsewhere, likely using IMAP.

2. How do I get the ham or myham mailbox to show up in Horde?
By default, Horde will show you all mailboxes created for that account -- you do not have to 'subscribe' to them like you do in SquirrelMail. I just tested this myself by creating a new folder in Thunderbird and then logging into my Horde webmail, and the new folder name was automatically found. This is why I believe the 'missing' spam box from the account you're logging into in Horde may not be the same username/password you use for CPanel.

3a. Then do I manually move ham from the spam mailbox to the ham mailbox and vice-versa?
3b. Or should I be using the myham and myspam mailboxes?
3c. where did these folders come from, anyway? did I make them at some point?)
3a. Since you've set up a separate Email address to forward your non-spam messages to, if you do find non-spam messages accidentally flagged as spam, forward those messages to your globalham@yourdomain.com account, then delete them from your spam box. Then in Horde, click on the 'purge deleted' link at the top right. If you find spam messages in your Inbox, you'll need to move those to your 'spam' folder, which we'll figure out after we get more information from you.
3b. The new v3 script won't use myspam/myham by default. If all you changed in the script was your CPanel username and domain name, then you can delete the ham/myham folders completely once you've moved any messages out of there if you need to keep them
3c. Depending on how long you've been using my script, the ham/myham/myspam folders may have been created by my script. The 'spam' folder will be created automatically by the system whenever that Email address gets a message that SpamAssassin flagged as spam... if you delete the whole folder, it will just get recreated again the next time you get spam.

4. When I run the cgi script, does it check the spam/ham mailboxes by default? Or the myham/myspam mailboxes?
If you downloaded the new v3.x script, and only changed your CPanel account name and your domain name, then the only place the script will check for ham messages is the Inbox of your globalham@yourdomain.com account and nowhere else. And, by default, the script will scan the individual, separate, 'spam' folders for each Email address you have at your domain name.

5. Do I need to be a professional programmer to figure this out?
Nope, shouldn't need to.

Basically, the difference between the v2 script you all ran previously, and this new v3 script, is that the old v2 script, by default, would look for individual ham folders (originally called 'myham', and later changed to just 'ham') where you could copy non-spam messages. The new v3 script could still be configured to look at those same ham/myham mailboxes if that's what your users are used to doing. Scanning messages they place in the ham/myham folders will actually be more accurate at scanning in the long run than forwarding the messages to a globalham@yourdomain.com Email account -- but forwarding to an Email account and just letting spam collect in their 'spam' mailboxes is frankly just way easier for most non-technical users.

I'm starting to think up a design for a web page I can build that will just build the script for you the way you're used to running it, ie: scanning folders called myham or ham, and scanning myspam instead of spam, for those that want to run things the old way. I'll see if I can get that running before Monday night. It will basically be a multi-page form where I ask a series of questions about how you want to scan ham/spam (or how you've done it previously) and then generate the script for you as a link that you can right-click and 'save target as' or 'save link as' and go from there.

Thanks for your help, I appreciate it.
That's what we're all here for!


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: telling on April 07, 2007, 10:02:39 PM

Which directory path do these folders live in, and which login username (don't tell us the password!) do you use to login to Horde? I'm curious if you're logging into Horde as the same user that the file manager path would suggest?

For example, if you login to CPanel as 'telling23' but login to Horde as something like 'jeffrey@mydomain.com' then no, the spam folder you probably see under /home/telling23/mail/spam or /home/telling23/mail/spam/cur/ will not be available under the jeffrey@mydomain.com individual user account.

I login as tellin2, both to cpanel and to webmail. The spam, ham, etc., folders all live in the folder called "mail".

However, when I log into webmail using tellin2, there is no spam folder. (I guess this makes sense, since no one uses that email address, so there probably hasn't been any spam sent to it - not yet, anyway...) When I log onto Horde using one of the individual mailboxes, though, I do see a spam folder.


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: w98 on April 07, 2007, 10:21:26 PM
Okay, so if you login to Horde as tellin2 , you don't see the spam folder, but you DO see a spam folder when you login to CPanel's File Manager? Does the spam folder have a size or does it say "0 k" beside it in the list? If it says 0k for size, then it could just mean that you have your 'default address' (aka 'catch-all' address) set to :fail: or :blackhole: ?


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: telling on April 08, 2007, 03:08:05 PM
Okay, so if you login to Horde as tellin2 , you don't see the spam folder, but you DO see a spam folder when you login to CPanel's File Manager? Does the spam folder have a size or does it say "0 k" beside it in the list? If it says 0k for size, then it could just mean that you have your 'default address' (aka 'catch-all' address) set to :fail: or :blackhole: ?
That's right, I see the spam folder in cpanel's file manager, but there is no size indicated that I can see, just permissions (700). Unrouted mail is set to :fail:


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: w98 on April 08, 2007, 06:15:49 PM
Okay, then no catch-all spam is being collected, which is fine.

Do you have multiple Email accounts or multiple Email users at your domain, or just a single Email address?


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: telling on April 08, 2007, 06:34:01 PM
Multiple, multiple, multiple! Let's see... I count 13.


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: w98 on April 08, 2007, 11:04:24 PM
Okay, well, if you want them to keep using the 'myham' folders, then delete the globalham@yourdomain.com account that you made, and in the code, comment out the line that starts with
Code:
$global_ham_email = "globalham" ;
to add a # at the start of the line, like this:
Code:
#$global_ham_email = "globalham" ;
then look for the line that says
Code:
#$user_hambox = "scan-ham" ;
and remove the # character and change it from "scan-ham" to "myham" like this:
Code:
$user_hambox = "myham" ;
Finally, look for the line that says
Code:
$user_spambox = "scan-spam" ;
and change it to
Code:
$user_spambox = "myspam" ;

Save the script, upload it to your server over top of the original script, and you should be back to running like the v2 script. Then let us know here if that works for you.


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: telling on April 09, 2007, 08:39:07 AM
I don't mind using the new version, or the old, but I still have the problem that only the spam box shows up in individual mailboxes in Horde - I can't see ham, myham, or myspam, so I don't see how to move incorrectly flagged ham or spam so the sa-training will work.

Okay, well, if you want them to keep using the 'myham' folders, then delete the globalham@yourdomain.com account that you made, and in the code, comment out the line that starts with
Code:
$global_ham_email = "globalham" ;
to add a # at the start of the line, like this:
Code:
#$global_ham_email = "globalham" ;
then look for the line that says
Code:
#$user_hambox = "scan-ham" ;
and remove the # character and change it from "scan-ham" to "myham" like this:
Code:
$user_hambox = "myham" ;
Finally, look for the line that says
Code:
$user_spambox = "scan-spam" ;
and change it to
Code:
$user_spambox = "myspam" ;

Save the script, upload it to your server over top of the original script, and you should be back to running like the v2 script. Then let us know here if that works for you.


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: telling on April 11, 2007, 10:02:17 AM
I would rather use the new script, with the globalham mailbox. But just to be sure I understand this:
False-Positives: these get sent to the globalham address.
False-Negatives: Users manually move these into the Spam folders in their respective email accounts.
Then I run the training script, which processes all the ham/spam in all the mail accounts.
Then everyone empties their spam mailboxes, and I empty the globalham mailbox.
Is this correct?

Okay, well, if you want them to keep using the 'myham' folders...


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: w98 on April 11, 2007, 03:21:39 PM
Yes, your last message of the two replies is the easiest way to use it. Doing this, they can delete the 'ham', 'myham' and 'myspam' folders entirely (after you move out any messages you need to keep).


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: telling on April 11, 2007, 09:54:44 PM
Great, thank you!


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: RicaDMI on May 10, 2007, 10:17:45 AM
I've setup sa-trainer and it's working great with my main domain.

I'm having problems with it checking add-on domains though.

I've uncommented the add-on domain list and have the code set up something like this:

Code:

@addon_domain_list = ( 'domain1.com' , 'domain2.com') ;


When the script outputs, it says that it's checked the main domain and domain2.com, but domain1.com was never checked.

If I reverse the order of the add-on domains, it will check the main domain and domain1.com, but not domain2.com.

I'm at a loss as to what's going on so if anybody could help me out, it would be much appreciated.

Thanks


Title: Re: How-to: Train SpamAssassin - Updated April 5 2007
Post by: w98 on May 10, 2007, 10:17:53 PM
RicaDMI,
Download version 3.03 from my web site, I've included a change in how add-on domains get added to the overall list of domains to check, which should fix the problem.


Title: Re: How-to: Train SpamAssassin - Updated May 10 2007
Post by: w98 on May 10, 2007, 10:25:10 PM
I've updated the first message in the post to reflect the changes made for v3.03. All users wanting to use the globalham@yourdomain.com for scanning ham messages, or users needing help with the add-on domain names, will need to upgrade.


Title: Re: How-to: Train SpamAssassin - Updated May 10 2007
Post by: RicaDMI on May 11, 2007, 09:08:56 AM
Thanks for the update w98!

I've tried running the script and it still won't check all the domains.  It's only getting the first add-on domain listed.

I looked at the script and decided to try to modify it a bit to see if it would work. I don't know perl at all, but I've used other scripts like php before and decided to look up some of the syntax for perl.

I changed the following line:
Code:
for ($i=0; $i < $#addon_domain_list; $i++) {

to

Code:
for ($i=0; $i < $#addon_domain_list+1; $i++) {

It seems to have worked to catch all my domains.  If perl scripters see something wrong, please let me know.

Thanks!


Title: Re: How-to: Train SpamAssassin - Updated May 10 2007
Post by: w98 on May 22, 2007, 01:25:33 PM
The primary domain is pushed onto the @domains array, then each of the other domains in the addon_domain_list using a for loop; I forget why I'm not using a foreach() loop there, but perhaps I need to correct the code from this:
Code:
@domains = () ;
push (@domains, $my_domain) ;
for ($i=0; $i < $#addon_domain_list; $i++) {
  if ($addon_domain_list[$i]) {
    push (@domains, $addon_domain_list[$i]) ;
  }
}
to this:
Code:
@domains = () ;
push (@domains, $my_domain) ;
foreach $addon_domain (@addon_domain_list) {
  push (@domains, $addon_domain) ;
}


Title: Re: How-to: Train SpamAssassin - Updated May 10 2007
Post by: sunnyv3 on May 24, 2007, 03:53:10 AM
I tried to use v3.03, but I get a 500 Internal Server Error.  v3.02 works fine.  Any ideas?


Title: Re: How-to: Train SpamAssassin - Updated May 10 2007
Post by: w98 on May 30, 2007, 04:31:47 PM
Make sure you have configured it correctly and have marked it with chmod permissions of 755, and it should work fine.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on May 30, 2007, 04:41:34 PM
Just an FYI for everyone:

After helping a non-LunarPages customer, it would appear that the newest release of SpamAssassin (3.2.0 ... LP uses 3.1.8 as of the time I write this) will require a new configuration option within the user_prefs file:
Code:
use_bayes   1
Without this option, SpamAssassin will fail to run the sa-learn utility. I recommend you add this line to the top of your existing user_prefs file. You will need to make this change manually since the CPanel interface for SpamAssassin will remove some of the other settings that I've suggested using since releasing this script back in 2004.

Adding this line into my own user_prefs file on the 'janus' server ran the trainer just fine, so adding this line to your existing user_prefs file, even if you server runs 3.1.8, appears to be okay. Of course, your mileage may vary -- if you notice problems with the trainer script once you make this change, just remove the line from user_prefs for the time being.

v3.04 of the script will be out soon - I've had a few users comment on problems that turned out to be bugs, so a new version will be out soon. I'll also be bundling a user_prefs file in the .zip file that you can download from iandouglas.com.

Enjoy!


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: planetlanham on June 03, 2007, 06:06:49 AM
Feeling quite the newbie here...  :hypno: I followed the quickstart directions, but am not sure if I did everything correctly.

Not every account has a spam box:
Once upon a time before trying this, I enabled spamassassin and enabled the spam box. I was doing email via POP3 and had it check my mainaccount and mainaccount/spam/ separately. Since then I've had a /spam folder for my mainaccount and a few others but not all. I'm not really sure how accomplished this task - it's been a while.

Is there something I should do to make sure my other email accounts have a spam box?


When running the trainer script I get a whole lot of WARNINGs
All of the warnings point to the fact that I don't have /spam folders for most of my accounts.

Does this affect the effectiveness of the script?


I can't tell if the trainer script is actually scanning emails.
For the accounts that have a /spam folder, the trainer script doesn't seem to show the right number of emails.
Right now, on my main account (let's call  it myemailaccount) I have 132 hams in the inbox and 9 spams in the spam folder. The script reads:
Code:
Checking /home/myaccount/mail/mydomain.com/myemailaccount/spam to learn SPAM: Learned tokens from 0 message(s) (1 message(s) examined)
and way down at the bottom it reads:
Code:
Number of HAM messages scanned over time:
Number of SPAM messages scanned over time:

According to all this I think I've done something wrong.
If so, what more info do I need to post to help get an answer?
If everything looks right, why does it show the incorrect number of emails scanned and no HAM/SPAM checked over time?

Edit: I've just recently set up my email clients to check email via IMAP instead of POP3.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: planetlanham on June 03, 2007, 06:11:03 AM
I have a separate question about the auto-whitelist:

My auto-whitelist file is 660K and is filled with what look, to me, like spammers' addresses. I searched for my last name inside the file since my wife's email address (and other family) ought to show up as whitelisted, right? Nope. I tried a few generic domains of people that I email with and nothing showed up. In other words, I don't recognize any emails in this file.

I enabled spamassassin over a year ago, so I assume this file has been getting bigger over time.

Should I empty this file? delete it?

Viewing it in BBEdit I see a bunch of garbled text leading in. Does that mean I should be careful in what I delete?


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on June 05, 2007, 10:25:52 PM
Should I empty this file? delete it?
As described in the SpamAssassin WIKI (http://wiki.apache.org/spamassassin/AutoWhitelist?highlight=%28whitelist%29), auto whitelisting basically tracks an average score of all Emails from a user's Email address, so future Emails 'trend' towards that score over time.

From their WIKI:
Quote
So if someone that never sent you mail before sends you a mail that scores 20, and then sends you a second mail that would score 2.0 without the AWL, the AWL will push the score up to 11 on the second mail. This is auto blacklisting, based on their past history of spam.

If that same person sent you a mail that scored 0, and then later sent one that scored 7, the AWL would push the score down to 3.5. This is auto-whitelisting based on past history of nonspam.

A "sender" is identified using both the address they sent with, and their IP address, so spam claiming to be From you with forged headers will fail to get through.

But the "auto whitelist" isn't really a whitelist per-se. It does however have a "learning white/blacklist" type behavior as a result of it's averaging.
The math formula they use is listed on the wiki page -- it's a little misleading they way they say messages get scored, as we may think it's a total of the SpamAssassin point values from the individual rules it breaks.

So to answer your question - deleting the file sets all of the spammers back at a clean slate meaning their spams are more likely to get through.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: sunnyv3 on June 06, 2007, 11:36:13 PM
Thanks.  Forgot to do the chmod step that I did with the previous install.  Works fine now.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: planetlanham on June 08, 2007, 08:23:59 AM
So to answer your question - deleting the file sets all of the spammers back at a clean slate meaning their spams are more likely to get through.

I'm glad I didn't get trigger happy and delete it then! Thanks!


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: planetlanham on June 08, 2007, 08:27:35 AM
I'm still getting this output when I run the script:

Code:
Number of HAM messages scanned over time:
Number of SPAM messages scanned over time:

ie, spamassassin does not seem to be scanning the emails anymore. I'm not getting any more spams in the spam folder with their subject line changed either. I had them set to have a "***SPAM***" in the front, but I don't have any emails with that heading anymore.

Have I inadvertently turned off spamassassin?


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: planetlanham on June 09, 2007, 11:15:47 AM
Still having lots of trouble.

Last night I logged into cPanel and set the subject line to ***spam*** again. I started getting emails with ***spam*** leading the subject line. It worked again! Until today. Now my spams are just showing up in the Inbox's again. And they're really obvious spams too! They contain words like **** and ***** and even ******!  duh

A while back our server (gemini) got hit by somethiing and LP had to go back and fix all the accounts. It took a while to get mine back to normal. They backed up my public_html folder and mail folder, so now I also have a public_html.bak and mail.bak folder. I wonder if this had anything to do with spamassassin?

Logging into cPanel -> Mail -> Add/Remove/Manage Accounts I was able to see the list of mail accounts. Everything looked fine until I took a peek in FTP.
Looking at the mail folder, I saw /mail/domain.name/ followed by a folder for each account. But the folder list did not match what was shown in cPanel!
When our server got hosed, we went back in and re-added mail accounts.
Before all this, I had enabled Spamassassin and the spam box. That's when things went smooth. Except, of course, a slow increase in spam.

Realizing that the folder list in /mail/domain.name/ didn't match, I backed up all Inbox's and used cPanel to delete those accounts that didn't have a spam box.
I disabled spamassassin and the spam folder.
I then recreated each account in cPanel.
I reenabled spamassassin and the spam folder hoping it would add spam boxes to the accounts I just recreated. Nope.
Out of desperation, I took an "empty" spam folder from another account and pasted it into each mailbox.
This time I left the subject line business alone.

Now when I run my script I get none of the "WARNING... could not find spam box" errors (paraphrased). Yay!
However, for each mailbox I get:
Checking /home/cpanelaccountname/mail/domain.com/mailaccountname/spam to learn SPAM: Learned tokens from 0 message(s) (1 message(s) examined)
and at the bottom I get:
Checking Global Email-based Hambox for HAM messages:
Checking /home/cpanelaccountname/mail/domain.com/globalham/inbox to learn HAM: Learned tokens from 0 message(s) (1 message(s) examined)

Number of HAM messages scanned over time:
Number of SPAM messages scanned over time:
It seems to only show 1 message examined and none scanned over time???
Many of the mailboxes have dozens, if not hundreds, of emails and it only shows one scanned?
Is this the way the script is supposed to look?

I'm at the point where I'm ready to delete everything in the .spamassassin folder, disable spamassassin and the spam folder, renable both again and hope it goes back to the way it was before I tried to trian spamassassin.  :help:

In the time it took to do all this and write this post (about an hour), I quit my email client (so it'd stop bugging me about missing accounts). When I started it back up, there were some new spams in the main account. (already!) Good news, they were in the /spam folder, but it looks like the spam filter on my email client got them, not spamassassin! Once again, the subject line was not modified. It's set to
Code:
rewrite_header subject {SPAM _SCORE(0)_}

Spamassissin was working fairly well, but I was getting a steady increase in spams, which is why I wanted to figure out this training bit.  :help: :help: :help:

Edit a few minutes later: I looked at the spams I got since I started working on all this today and it looks like spamassassin did catch them! It just didn't modify the subject line. They all began with Spam detection software, running on the system "gemini.lunarpages.com", has identified this incoming email as possible spam.  I went into cPanel and changed the {SPAM _SCORE(0)_} to ***SPAM***. I like the idea of the score showing up in the subject line, but I never got that working.

That's good, but I still don't think spamassassin is being trained!


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: silver45 on June 11, 2007, 02:35:22 PM
However, for each mailbox I get:
Checking /home/cpanelaccountname/mail/domain.com/mailaccountname/spam to learn SPAM: Learned tokens from 0 message(s) (1 message(s) examined)
and at the bottom I get:
Checking Global Email-based Hambox for HAM messages:
Checking /home/cpanelaccountname/mail/domain.com/globalham/inbox to learn HAM: Learned tokens from 0 message(s) (1 message(s) examined)
I'm seeing "Learned tokens from 0 message(s) (0 message(s) examined)" for HAM folders that I know for a fact have stuff in them, too. It seems to be looking at the SPAM folders just fine.

Hmm, took another look at the script, and "$user_hambox = "ham" ;" was still commented out. Uncommenting that line seems to have solved it :happy:. Now to wait and see if it's actually learning anything :-).


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on June 13, 2007, 10:49:26 PM
@silver45:
I was supposed to have set the script so it would only need $global_ham_email OR $user_hambox, but not both. I guess I forgot to write that condition into the 'sanity check' logic.

@planetlanham:
I'm not sure why your subject lines aren't rewriting as they're supposed to. Keep in mind that altering the SpamAssassin configuration through CPanel will overwrite the file, and will likely lose the portions of the file that point at your bayesian databases -- I suggest you manually edit the file per the documentation, and see if that helps, otherwise SpamAssassin may, by default, not be using the bayesian databases at all.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: planetlanham on June 15, 2007, 04:46:56 AM
I thought I'd give this (another) fresh attempt.

I noticed something a little odd. In the trainer script there is a line that says
Code:
bayes_path   /home/myaccount/.spamassassin/bayes
I logged into my account through ftp and noticed something a little odd. I don't have a /bayes folder in .spamassassin!

Maybe this had something to do with my account being hosed so bad (gemini crash) earlier? What is supposed to be in this folder? Do I need to contact support or is this something I can do myself?



Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: planetlanham on June 21, 2007, 05:30:00 AM
Spamassassin is re-writing my subjects now (I like having the score there), but...

Auto-Learn is still not working.

I don't know if this makes a difference, but I got the same spam sent to two real addresses and to an address that doesn't exist on my account. I got three different results.

Account: real email address #1
Result: Score of 12.4, sent to /spam folder
In Headers: X-Spam-Status: Yes, score=12.4 required=3.5 tests=AWL,HTML_MESSAGE, RCVD_IN_NJABL_DUL,URIBL_AB_SURBL,URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_SC_SURBL, URIBL_WS_SURBL autolearn=no version=3.1.8

Account: real email address #2
Result: Score of 8.6, sent to /spam folder
In Headers: X-Spam-Status: Yes, score=8.6 required=3.5 tests=AWL,HTML_MESSAGE, RCVD_IN_NJABL_DUL,URIBL_AB_SURBL,URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_SBL, URIBL_SC_SURBL,URIBL_WS_SURBL autolearn=no version=3.1.8

Account: does not exist, sent to default email address (which I don't use - in fact, the address is spam@domain.com)
Result: Not marked as spam!
In Headers: No, score=0.0 required=3.5 tests=HTML_MESSAGE autolearn=unavailable version=3.1.8

Same email, three different results! What bugs me most is the "autolearn=unavailable". I still think that this has something to do with what I think is a missing folder in my accoiunt. Anyone?  :?


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: silver45 on June 21, 2007, 08:48:28 AM
...version=3.1.8...
That's interesting. My spam headers have "version=3.1.7". Are there different versions of SA on different servers? I'm on fomax if that helps.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on June 29, 2007, 05:12:33 AM
Code:
bayes_path   /home/myaccount/.spamassassin/bayes
I logged into my account through ftp and noticed something a little odd. I don't have a /bayes folder in .spamassassin!

It's not a folder, it's the start of the filenames that SpamAssassin creates for the bayesian databases, such as:
/home/myaccount/.spamassassin/bayes_toks
/home/myaccount/.spamassassin/bayes_seen
etc
If your scanning doesn't show scans over time, then either your user_prefs file is broken, or the permissions on the /.spamassassin/bayes_* files are not correct. You can try deleting the files and starting your scans again.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on June 29, 2007, 05:14:23 AM
...version=3.1.8...
That's interesting. My spam headers have "version=3.1.7". Are there different versions of SA on different servers?

Yes, entirely possible, but shouldn't make that much of a difference.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on June 29, 2007, 05:21:15 AM
Auto-Learn is still not working.
It's turned off on the 'janus' server too, but in my mail headers I see it as "autolearn=no". You should ask official LP support why it's saying "unavailable".

I don't know if this makes a difference, but I got the same spam sent to two real addresses and to an address that doesn't exist on my account. I got three different results.
Wow, those results look pretty bizarre. The only thing I can recommend that you check are the 'Received' headers from each message to see if they were all sent from the same sender IP address -- remember that spammers use world-wide botnets to send out spam, so it's entirely likely that two different botnet/zombie systems sent the same piece of spam to three of your addresses, and two of them hit some of the checks for blacklisted IP's (one of your message was flagged at more blacklist checks than the other, accounting for the score difference).


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: planetlanham on June 29, 2007, 05:54:34 AM
If your scanning doesn't show scans over time, then either your user_prefs file is broken, or the permissions on the /.spamassassin/bayes_* files are not correct. You can try deleting the files and starting your scans again.

I should probably wait before I post, but I'm too excited!  :D  I deleted the bayes_* files (actually renamed them as "savebeforedeleting.bayes_seen", etc) and tried running the script again.

I was very disappointed to see the same results, but this time it seemed to be hanging. So I started to look and realized that the first few mailboxes were ones that are rarely used, so there are very few (if any) messages to scan. I looked at the last line and it said "Learned tokens from 218 message(s) (218 message(s) examined) " 
:yey:

I hope I didn't jinx myself by posting before the results were done, but that last line is where it stopped. The next line said "Checking /home/username/mail/domain.com/mailbox/spam to learn SPAM:" without a list of messages scanned and just seemed to stop. I hit refresh to run the script again. It's working. I think I should just leave it alone. Maybe minimize the window and stop looking... :)

Ok... I forgot to hit "Post" after typing all that. It's probably been 30 minutes and the script stopped on the 3rd mailbox. (it stopped on about the 8th mailbox before) This can't be right. Probably jinxed myself even without hitting Post.



Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: planetlanham on June 29, 2007, 06:17:53 AM
aw nuts. I guess I jinxed myself. Running the script now displays

Code:
Checking /home/username/mail/domain.com/firstmailbox/spam to learn SPAM:

then stops. :(


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: planetlanham on June 29, 2007, 06:31:36 AM
aw nuts. I guess I jinxed myself. Running the script now displays

Code:
Checking /home/username/mail/domain.com/firstmailbox/spam to learn SPAM:

then stops. :(

I deleted the bayes_* files again and gave it another go. This time it repeated what it did the first time. It scanned 7 mailboxes with very little mail, scanned a mailbox with 219 messages, then stopped on the next mailbox.



Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on June 29, 2007, 07:29:35 AM
Does it stop immediately, or does it just time out?

Timeouts are common if you're scanning hundreds of messages in every folder. Best action you can take is to scan more frequently (say, every 3-4 days instead of once per week) and delete the messages from the folders. If you get hundreds of spam messages on a daily basis then perhaps you want to employ a few mail filters to discard some of your spam before it fills your spambox. Personally, I filter out any messages that already score as BAYES_99 because there's little point re-training on such messages, and that alone dropped my spambox's message count on a daily basis. As tokens expire every 3-4 months, old spam may leak through if LP hasn't been diligent in maintaining their SA rules.

I've also noticed that scanning on LP is actually substantially faster than other cpanel-enabled hosting providers I've worked with using this script (way to go LP!), but it's possible that if you try scanning when there's a high CPU load or disk activity, that it could slow down your script to the point of timeouts. Just scan more often and you should be fine. Heck, you could write a Perl script to detect when you've got a threshold of messages and send you an Email alert or something...


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: silver45 on June 29, 2007, 09:04:38 AM
aw nuts. I guess I jinxed myself. Running the script now displays

Code:
Checking /home/username/mail/domain.com/firstmailbox/spam to learn SPAM:

then stops. :(
I noticed mine doing this, so I thought maybe it had something to do with the amount of spam in the box (almost 2K messages :hypno:). So, I moved about 2/3 of them to a temporary box and ran the script again, and it ran fine (if still a bit slow).

So it seems (from my admittedly anecdotal evidence) that the script hangs if there's too many messages in a box. I'm not sure what constitutes "too many," but so far the most I've had it work with is 701.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: ace22 on June 30, 2007, 04:56:59 PM
I do not understand how this work:
If i get an email that pass SA tests but is is STILL a spam, how can i tel your script that it is a spam.

Do i need to send it to my mailbox and move it to the spam folder?
  (i have outlook to download all emails and not to keep them in inbox).
If i do so, won't SA start find my address as source of spam since now i am sending the spam to my self?

P.S
I think there is a bug in the script here:
if ($cpanel_username eq 'domain') {
   $continue = 0 ;
   $error_msg = 'You need to properly configure $cpanel_username within the script, or the script will not operate' ;
}
if ($my_domain eq 'domain') {
   $continue = 0 ;
   $error_msg = 'You need to properly configure $my_domain within the script, or the script will not operate' ;
}

It should be NOT eq,
 no?


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: planetlanham on July 01, 2007, 06:53:44 PM
So it seems (from my admittedly anecdotal evidence) that the script hangs if there's too many messages in a box. I'm not sure what constitutes "too many," but so far the most I've had it work with is 701.

That did the trick! We had over 7,000 spams total. I moved all the spams I could into a /temp folder and ran it again. It didn't help that one of our email clients kept pasting the spams back into the /spam folders, resulting in almost 4,000 spams in both the /spam folder and /temp folders on our default account. Geez!

I'm going to slowly move pieces of spam out of /temp into /spam and run the script again.

Crossing my fingers hoping that it works from now on...


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: ace22 on July 03, 2007, 11:11:12 PM
I do not understand how this work:
If i get an email that pass SA tests but is is STILL a spam, how can i tel your script that it is a spam.

Do i need to send it to my mailbox and move it to the spam folder?
  (i have outlook to download all emails and not to keep them in inbox).
If i do so, won't SA start find my address as source of spam since now i am sending the spam to my self?

More over, what is the point to run the script on the spam folder?
In the spam folder there are messages that SA has already found as SPAM, so why to run it on that folder?

P.S
I think there is a bug in the script here:
if ($cpanel_username eq 'domain') {
   $continue = 0 ;
   $error_msg = 'You need to properly configure $cpanel_username within the script, or the script will not operate' ;
}
if ($my_domain eq 'domain') {
   $continue = 0 ;
   $error_msg = 'You need to properly configure $my_domain within the script, or the script will not operate' ;
}

It should be NOT eq,
no?


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: telling on July 11, 2007, 08:15:34 PM
I updated to the new version, and now I get this error message:

syntax error at tellingpix-sa-trainer.cgi line 91, near "$check_user_Inbox_for_ham "
BEGIN not safe after errors--compilation aborted at [domain]-sa-trainer.cgi line 189.

I looked at that line, and it's set to "N"


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on July 13, 2007, 12:27:28 AM
If i get an email that pass SA tests but is is STILL a spam, how can i tel your script that it is a spam.
If you've downloaded the message to your local system via POP3, you simply move the message back into your 'spam' folder via IMAP. If you've configured the script correctly, it will see that message in your spam folder the next time you run the script, which teaches SpamAssassin that this particular message was spam.

If i do so, won't SA start find my address as source of spam since now i am sending the spam to my self?
If you set up an additional Email profile within Outlook as an IMAP connection instead of POP3, you can simply drag the message from your POP3 "Personal Folders" Inbox to the Inbox of the IMAP account you've just added. By dragging the messages in this fashion, none of the headers change, so SpamAssassin will not associate you with sending spam.

If you were to *forward* a copy of the message by clicking on the 'forward' button, then yes, your name and Email address would be seen as the sender of the message, and SpamAssassin will begin to learn that you are a spammer.

I think there is a bug in the script here:
if ($cpanel_username eq 'domain') {
You've got an old copy of the script -- this bug was fixed and does not exist in v3.04.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on July 13, 2007, 12:29:57 AM
syntax error at tellingpix-sa-trainer.cgi line 91, near "$check_user_Inbox_for_ham "
BEGIN not safe after errors--compilation aborted at [domain]-sa-trainer.cgi line 189.
I'll need to see your exact script to diagnose this. Contact me at the Email address in the comments of the script around line 18, and attach your script as a file attachment and I'll have a look.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on July 13, 2007, 10:14:22 AM
I'll need to see your exact script to diagnose this.
Telling had a syntax error earlier in the script where he removed a $ from a variable and forgot to end that same line with a semicolon, while enabling the global_ham_mailbox variable. Once the syntax error was fixed, the script worked as intended.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: Mingers on July 31, 2007, 05:38:07 PM
I've just noticed that the "Enable Spam Box" button is missing in my cPanel under "Mail" then "SpamAssassin".

I re-enabled my spam box by putting https://login.servername.lunarpages.com:2083/frontend/lp/mail/addspambox.html? in my web browser =0)

Hope that might help someone.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: pheared on July 31, 2007, 05:49:58 PM
For whatever reason, the spam box feature has disappeared during the latest cPanel upgrade.  I had to complain a few times before it was fixed for my account.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: Mingers on July 31, 2007, 06:05:09 PM
Well I had migrated servers since I required PHP5 and for a few weeks my spam filtering has gone from working perfectly to not at all and everything in between.

I only noticed the missing button now when I needed to use the bugger!


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: pheared on July 31, 2007, 06:17:45 PM
I had the exact same experience.  It was humming along just fine until I was migrated to a new server which had the newer version of cPanel.  It took a few weeks of slow back and forth with the support desk to (I think) finally get things straightened out.  Very frustrating indeed.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: chuckfa on August 11, 2007, 02:38:51 AM
Can't get it to work.  I thought I followed the directs, but here's what the script returns.  I paid the $20 for the individual help but didn't get a response to my emails yet, help please.  Thanks!
****************
sa-trainer.cgi version 3.04 by Ian Douglas, iandouglas.com, Copyright 2004-2007
Some Rights Reserved under a Creative Commons "Attribution Non-commercial" license
Support for this script available here

Autodetected mail storage as Mbox; you could speed up this script slightly if you configure $mail_format in the script to "Mbox"

Training SpamAssassin for ccdastro.net:

WARNING: /home/ccdast2/mail/ccdastro.net/ccd/spam did not exist; attempting to create it; scanner will say it learned from 0 messages if successful or produce another warning if unsuccessful
WARNING: Could not find spambox for ccd@ccdastro.net, cannot scan SPAM
WARNING: /home/ccdast2/mail/ccdastro.net/chuck/spam did not exist; attempting to create it; scanner will say it learned from 0 messages if successful or produce another warning if unsuccessful
WARNING: Could not find spambox for chuck@ccdastro.net, cannot scan SPAM
WARNING: /home/ccdast2/mail/ccdastro.net/cody/spam did not exist; attempting to create it; scanner will say it learned from 0 messages if successful or produce another warning if unsuccessful
WARNING: Could not find spambox for cody@ccdastro.net, cannot scan SPAM
WARNING: /home/ccdast2/mail/ccdastro.net/globalham/spam did not exist; attempting to create it; scanner will say it learned from 0 messages if successful or produce another warning if unsuccessful
WARNING: Could not find spambox for globalham@ccdastro.net, cannot scan SPAM
WARNING: /home/ccdast2/mail/ccdastro.net/mail_lists2/spam did not exist; attempting to create it; scanner will say it learned from 0 messages if successful or produce another warning if unsuccessful
WARNING: Could not find spambox for mail_lists2@ccdastro.net, cannot scan SPAM
WARNING: /home/ccdast2/mail/ccdastro.net/nospam/spam did not exist; attempting to create it; scanner will say it learned from 0 messages if successful or produce another warning if unsuccessful
WARNING: Could not find spambox for nospam@ccdastro.net, cannot scan SPAM
WARNING: /home/ccdast2/mail/ccdastro.net/settime/spam did not exist; attempting to create it; scanner will say it learned from 0 messages if successful or produce another warning if unsuccessful
WARNING: Could not find spambox for settime@ccdastro.net, cannot scan SPAM
WARNING: /home/ccdast2/mail/ccdastro.net/temma/spam did not exist; attempting to create it; scanner will say it learned from 0 messages if successful or produce another warning if unsuccessful
WARNING: Could not find spambox for temma@ccdastro.net, cannot scan SPAM
WARNING: /home/ccdast2/mail/ccdastro.net/temp/spam did not exist; attempting to create it; scanner will say it learned from 0 messages if successful or produce another warning if unsuccessful
WARNING: Could not find spambox for temp@ccdastro.net, cannot scan SPAM

Checking Global Email-based Hambox for HAM messages:
Checking /home/ccdast2/mail/ccdastro.net/globalham/inbox to learn HAM: Learned tokens from 0 message(s) (1 message(s) examined)


Number of HAM messages scanned over time:
Number of SPAM messages scanned over time:


<a href="/cgi-bin/ccdsa-trainer.cgi>re-scan mailboxes




Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on August 13, 2007, 10:27:56 AM
Can't get it to work.  I thought I followed the directs, but here's what the script returns.  I paid the $20 for the individual help but didn't get a response to my emails yet, help please.  Thanks!
Well, that's what I get for taking a vacation ;o)

Anyhow Chuck's problem was resolved rather quickly -- the errors about the spam folders were due to LP not filtering his spam messages correctly. And the filtering was working correctly because his user_prefs file was the default file that I bundle in my zip archive that has a place where you MUST insert your proper cpanel username. Without that username being put in the user_prefs file, SpamAssassin will refuse to run, which means LP cannot filter messages properly.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on August 14, 2007, 06:46:14 AM
So since LP's upgrade for CPanel is taking away our ability to rewrite our own subject lines any longer, it's perfectly safe to use the CPanel interface to configure SpamAssassin again, as it will remove the rewrite_header line for the subject line.

Note that having the rewrite_header line in the user_prefs file will NOT harm it in any way, SpamAssassin will run just fine with or without it.

I'll update my documentation at iandouglas.com and so on later today.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on August 16, 2007, 07:14:52 AM
By the way, anyone wondering where the last few messages went, Mitch, one of the lead moderators/admins of the forums decided to merge a number of messages into a central place and created a new message thread where we could just discuss the failing SpamAssassin engine issue on various servers.

The first few pages go back a couple of months, but catch up to recent messages soon enough:
http://www.lunarforums.com/index.php/topic,42723.0.html


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: Mitch on August 16, 2007, 07:24:35 AM
Please use this thread (http://www.lunarforums.com/index.php/topic,42723.msg305437.html#msg305437) for any of the conversation on or about the recent cPanel 11/Spam Assassin discussion.  Thanks!


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: BadCam on September 05, 2007, 02:06:24 AM
Hello Ian,

Thank you very much for creating spam assassin.

I've set SA up and the script seems to be working well. When I run the cgi script I get the following message:

sa-trainer.cgi version 3.04 by Ian Douglas, iandouglas.com, Copyright 2004-2007
Some Rights Reserved under a Creative Commons "Attribution Non-commercial" license
Support for this script available here

Autodetected mail storage as Maildir; you could speed up this script slightly if you configure $mail_format in the script to "Maildir"

Training SpamAssassin for mydomain.com:
WARNING: Could not find spambox for admin@mydomain.com, cannot scan SPAM
Checking /home/xt88002/mail/mydomain.com/alan/.spam/cur/ to learn SPAM: Learned tokens from 0 message(s) (0 message(s) examined)
WARNING: Could not find spambox for ana@mydomain.com, cannot scan SPAM
Checking /home/xt88002/mail/mydomain.com/cameron/.spam/cur/ to learn SPAM: Learned tokens from 0 message(s) (0 message(s) examined)
WARNING: Could not find spambox for enquiries@mydomain.com, cannot scan SPAM
WARNING: Could not find spambox for globalham@mydomain.com, cannot scan SPAM

Checking Global Email-based Hambox for HAM messages:
Checking /home/xt88002/mail/mydomain.com/globalham/cur/ to learn HAM: Learned tokens from 0 message(s) (0 message(s) examined)

Number of HAM messages scanned over time: 877
Number of SPAM messages scanned over time: 15404

re-scan mailboxes

(I have replaced my domain name with "mydomain.com")

1) Does this look correct?

2) How do I configure $mail_format in the script?

3) How do I access the Spam folders using Thunderbird? Do I have to set up a Spam IMAP account for each Email account I have? I've set up the global address for spam, but how do I see the spam folders? I don't wish to use webmail.

4) I have received a Spam that SA doesn't recognise as Spam. How do I tell SA that it's Spam? I don't see anywhere in your instructions, or this thread, how to deal with spam that not recognised as spam.

I would just like to add, that the Spam that's coming through doesn't appear to be being checked by SA. Am I incorrect? Is this sufficient (or too much?) infomation for you:

From - Wed Sep 05 21:49:38 2007
X-Account-Key: account4
X-UIDL: UID485-1182998340
X-Mozilla-Status: 0000
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:                                                                                 
Return-path: <jai@alltheunknown.com>
Envelope-to: babbittbuchwald@mydomain.com  <--I have Changed my domain name here
Delivery-date: Wed, 05 Sep 2007 02:43:49 -0700
Received: from 61-225-150-170.dynamic.hinet.net ([61.225.150.170])
   by atlas.lunarpages.com with esmtp (Exim 4.66)
   (envelope-from <jai@alltheunknown.com>)
   id 1ISrOO-0004ah-NU
   for babbittbuchwald@mydomain.com; Wed, 05 Sep 2007 02:43:48 -0700
Received: from [61.225.150.170] by mailstore1.secureserver.net; Wed, 36 Aug 2007 17:42:41 +0800
Message-ID: <0107ffa4$0107fe78$aa96e13d@jai>
From: "Alissa Sadler" <jai@alltheunknown.com>
To: <babbittbuchwald@mydomain.com> <--I have Changed my domain name here
Subject: Save $369.05 adobe acrobat 8 $79
Date: Wed, 36 Aug 2007 17:42:41 +0800
MIME-Version: 1.0
Content-Type: text/plain;
   format=flowed;
   charset="windows-1250"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1106
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
X-Spam-Status: No, score=
X-Spam-Score:
X-Spam-Bar:
X-Spam-Flag: NO


Thank you kindly. This is a lot of fun, by the way. Spam sucks!


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: MikePL on September 14, 2007, 12:17:39 PM
After reading all the post here and the long instructions on Ian's site I still have a problem.

I've used this script a few years ago (4 or 5) and I remember having success. Then I simply forgot about the script but with the new wave of spam I had to go back to it again.

My problem is that I just can't set up the script. I read the instructions and can't figure out if it works for me. I don't remember how it worked for me a few years ago either  :?.

I have a mail program (the bat) and there I have 8 mail accounts from different parked domains (5 domains within my account). I don't use IMAP anywhere. I use POP as I work in photography and keeping all those e-mails with big attachments is pointless. I know I have space, but I have some respect for the LP server's resources, and when I don't need, I don't waste web space. I never use horde or squirrel either.

How should I set up the account, my mail program and maybe other thnigs so that whenever I download spam I am able to put it in a spam folder (and ham in ham folder). Remember that I use POP.

I don't mind creating an IMAP account and create ham and spam folders there, but...
1. How to configure the script to scan the 'ham' folder in, for example, sa@mydomain.com?
2. How to configure the script to scan the 'spam' folder in sa@mydomain.com?
3. Will the script and SpamAssassin handle e-mail copied from my other domains properly? Won't it mark my addresses as spam senders?


I hope you get my idea. The script seems to be for people IMAP and webmail-oriented, while I am POP and absolutely no webmail.  I am tired of writing new e-mail filters. Currently I have 36 filters and I fear that I may loose good messages, as plain brute filtering is not the solution.

Any ideas?


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: Mitch on September 14, 2007, 12:21:58 PM
Not sure how relative this guide is after the cPanel 11 upgrades that changed the way Spam Assassin works a lot.  So with that said, I'm going to go ahead and lock this thread till we can get some more information on this issue to you.  Thanks!

Alright, we are open for business yet again.  :clap:  Thread unlocked!


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on September 18, 2007, 07:17:27 AM
Hello Ian, Thank you very much for creating spam assassin.

I didn't create SpamAssassin, I just created this HOWTO guide for how to use it.

Training SpamAssassin for mydomain.com:
WARNING: Could not find spambox for admin@mydomain.com, cannot scan SPAM
Checking /home/xt88002/mail/mydomain.com/alan/.spam/cur/ to learn SPAM: Learned tokens from 0 message(s) (0 message(s) examined)
WARNING: Could not find spambox for ana@mydomain.com, cannot scan SPAM
Checking /home/xt88002/mail/mydomain.com/cameron/.spam/cur/ to learn SPAM: Learned tokens from 0 message(s) (0 message(s) examined)
WARNING: Could not find spambox for enquiries@mydomain.com, cannot scan SPAM
WARNING: Could not find spambox for globalham@mydomain.com, cannot scan SPAM

Checking Global Email-based Hambox for HAM messages:
Checking /home/xt88002/mail/mydomain.com/globalham/cur/ to learn HAM: Learned tokens from 0 message(s) (0 message(s) examined)

Number of HAM messages scanned over time: 877
Number of SPAM messages scanned over time: 15404
1) Does this look correct?

Not really, no. It appears that some of your mailboxes don't have a 'spam' folder any more. It seems that your globalham Inbox didn't have any messages in it to scan, either.

It seems that the upgrade to CPanel 11 has removed two things from the older SpamAssassin setup: spam folders, and letting us rewrite our own subject lines to include things like a spam score.


2) How do I configure $mail_format in the script?

This isn't a *requirement*, it just makes the script a few milliseconds faster if you already know what kind of mail format you use. If you edit the script, look for two lines of code that look like
Code:
#$mail_format = "Mbox" ;
#$mail_format = "Maildir" ;
and remove the '#' character at the start of the line that has "Maildir".


3) How do I access the Spam folders using Thunderbird? Do I have to set up a Spam IMAP account for each Email account I have? I've set up the global address for spam, but how do I see the spam folders? I don't wish to use webmail.

If you already use IMAP, you can just subscribe to the existing 'spam' folder for each account, or, if a 'spam' folder doesn't already exist, you'll need to create it for each account.
If you use Thunderbird for POP3, you can add additional IMAP accounts using the same login details as your POP3 setup, and subscribe to the 'spam' folders for each account so you can drag all spam messages to your 'spam' folder.


4) I have received a Spam that SA doesn't recognise as Spam. How do I tell SA that it's Spam? I don't see anywhere in your instructions, or this thread, how to deal with spam that not recognised as spam.

It's actually very well covered in the documentation. A spam message ending up in your Inbox is called a "false negative" -- it was falsely identified as 'not spam'. You'll need to move that message into your 'spam' folder via your IMAP account so that the next time you run the training script, the script will see the message in your spam folder and learn things about it accordingly.

I would just like to add, that the Spam that's coming through doesn't appear to be being checked by SA. Am I incorrect? Is this sufficient (or too much?) infomation for you:

X-Spam-Status: No, score=
X-Spam-Score:
X-Spam-Bar:
X-Spam-Flag: NO

Yes, this is an indication that "spam scoring" has stopped working on your server, and you should contact LunarPages support with that exact mail header, and let them know that spam scoring is not working and could they please restart the process that makes this happen. To my knowledge so far, LP has not yet upgraded the version of the Perl programming interpreter that CPanel 11 requires, and since parts of SpamAssassin use Perl, some parts of SpamAssassin are either not working, or stop working after a period of time, so it's likely that this "spam scoring" process will stop again on its own at some point, so sending a *friendly and courteous* message to LP support asking them to restart the process will be necessary.

It's also worth noting that in cases where the spam scoring stops and you get *flooded* with spam -- please do keep moving those spam messages into your 'spam' folders and keep training on them -- SpamAssassin will continue to learn from these messages so when the spam scoring *does* get restarted by LP support, your SpamAssassin databases will be that much better at scoring spam.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on September 18, 2007, 07:39:12 AM
My problem is that I just can't set up the script. I read the instructions and can't figure out if it works for me.

The output from the script is a good indication of whether or not it's working -- can you post the output of the script so I can help determine if things are running okay?

How should I set up the account, my mail program and maybe other thnigs so that whenever I download spam I am able to put it in a spam folder (and ham in ham folder). Remember that I use POP.

This is the most common setup that LP likely has -- you as a user set up your Email client to download all incoming mail via POP3 so it's not stored on the server any longer. However, since the SpamAssassin training script runs on the server, any messages you want to train as ham/spam also need to be copied back onto the server. This is why the training script requires that you set up an IMAP account so you can drag messages from your POP3 Inbox back to your IMAP 'spam' folder, and to do the same with moving non-spam messages to your IMAP 'ham' folder. The difference is that after scanning, you would drag the IMAP 'ham' messages back to your Inbox (or wherever you sort them from there), and you would simply delete the spam messages from the IMAP 'spam' folder.

I don't mind creating an IMAP account and create ham and spam folders there, but...


1. How to configure the script to scan the 'ham' folder in, for example, sa@mydomain.com?

Since the default "global ham" mailbox is set for "globalham" you will need to find the line that says
Code:
$global_ham_email = "globalham" ;
and change it to "sa" like this:
Code:
$global_ham_email = "sa" ;
This way, all of your 'ham' (non-spam) messages can simply forwarded to that mailbox by all of your domain users, and they will be trained as non-spam the next time you run your script.


2. How to configure the script to scan the 'spam' folder in sa@mydomain.com?

By default, the script will want to scan individual spam mailboxes. You will need to make the following changes to allow for a different scenario:

- change the line that says
Code:
$check_user_spamboxes_for_spam = "Y" ;
so the "Y" is set to "N", like this:
Code:
$check_user_spamboxes_for_spam = "N" ;

- uncomment the line that says
Code:
#$global_spambox = "scan-spam" ;
so it looks like this:
Code:
$global_spambox = "mydomain.com/sa/spam" ;
and replace "mydomain.com" with your actual domain. This is an undocumented 'feature' of using the script to configure the folder names -- the folder names are actually disk-path relative to /home/yourusername/mail/ so in this case you're going to tell the script to scan /home/yourusername/mail/mydomain.com/sa/spam/cur/ (as a full disk path including the 'cur' portion for Maildir) for spam messages.


3. Will the script and SpamAssassin handle e-mail copied from my other domains properly? Won't it mark my addresses as spam senders?

Using the global spam folder, I add an extra command-line setting to the actual SpamAssassin training program that tells it to ignore the message headers, so it won't mark your users as spammers. So any messages copied to the ham/spam folders of your sa@mydomain.com account won't necessarily be as *accurate* when training ham/spam, but will definitely be better than nothing. Typically, you would *want* to scan the headers of all incoming mail so it can learn tokens about where the message came from, the date, subject line, etc. but using these 'global' settings make it more convenient for you and your users.

The script seems to be for people IMAP and webmail-oriented
My script has the same requirement as the SpamAssassin software itself: that you have messages back on the server in order to train them. The SpamAssassin software on the LunarPages servers has no way to read the mail sitting on your hard drive that you downloaded via POP3. The most convenient way to move messages back to the server is IMAP. There are other ways to move your messages back to the server, but they are WAY more complicated, which is why I tell people to set up IMAP accounts to move the messages back to LunarPages for scanning. (As for webmail, it's simply just a browser-based interface for IMAP.)


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: silver45 on September 18, 2007, 11:08:30 AM
This is the most common setup that LP likely has -- you as a user set up your Email client to download all incoming mail via POP3 so it's not stored on the server any longer. However, since the SpamAssassin training script runs on the server, any messages you want to train as ham/spam also need to be copied back onto the server. This is why the training script requires that you set up an IMAP account so you can drag messages from your POP3 Inbox back to your IMAP 'spam' folder, and to do the same with moving non-spam messages to your IMAP 'ham' folder. The difference is that after scanning, you would drag the IMAP 'ham' messages back to your Inbox (or wherever you sort them from there), and you would simply delete the spam messages from the IMAP 'spam' folder.
Just another POV, I tried going to IMAP and very much didn't like it, so what I do is I use the Horde interface to copy good mail from the inbox to the HAM folder, or move any SPAM in the inbox to the SPAM folder, then use POP to download the stuff left in the inbox. Since everything in HAM is a copy of mail I've already downloaded, I can just empty the box after running the trainer.

It may be extra work, but for me it works better than trying to get my Eudora setup to work the way I want it to with IMAP.

Like I said, just another way of accomplishing the same end result, without needing to deal with IMAP :-).


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on September 18, 2007, 12:04:45 PM
just another way of accomplishing the same end result, without needing to deal with IMAP :-).

Aside from the fact that Horde is just a web interface for IMAP, yes :-)

But yes, I think that *copying* non-spam to your 'ham' folder is the best way to handle training ham. The other way would be to *move* it to your 'ham' folder, run the training script, then move it all back. But if you *copy* to your 'ham' folder, you can just delete the messages in the 'ham' folder when you're done. The other scenario (move, train, move back) I feel is riskier because there's a chance you forget a needed message and delete it by accident.

Personally, I *move* non-spam messages that I don't need to keep (newsletters, notices, mailing list traffic that doesn't interest me, Email notices that someone replied to this message thread, for example) into my 'ham' folder, and only *copy* over non-spam messages that I *need* to keep copies of (messages for work/contracts, notes from my wife, etc)

Spam *always* gets moved to the 'spam' folder -- no need to keep it around ever.

But personally, I only use IMAP for my Email 'cause I figure that (a) LP has the storage, (b) keeps backups, and (c) since I triple-boot into Gentoo/Ubuntu/Windows frequently throughout the day, I can still access all of my Emails no matter which OS I'm running, or even have access to all of those messages while on the road.

Though, given the CPanel 11 fiasco, I'm still tempted to just set up a Linux box here at home to be my mail server for all of my domains and just download everything here at home, run my own SpamAssassin filters, etc., and not have to rely on any third-party for my Email ... some days I really miss running my own hosting business. :-)


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: gadgetfan on September 22, 2007, 12:01:20 PM
Thanks, w98, I've just started using SA and your script, and so far, so good.  One question, though.  I'm set up to use the global ham mailbox, and it's not clear to me if you intend for users to forward the entire SA-edited message (complete with scoring information and such) to the global ham box, or just the original message.  I presumed the latter, but I figured I should check.

Thanks.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on September 22, 2007, 12:10:01 PM
Hi gadgetfan,

Your users can just forward their incoming non-spam messages as-is, either as an attachment or as an inline message, SpamAssassin will learn enough tokens to remember what kind of mail you like to get. It's also configured, via my user_prefs suggestions, to ignore other SpamAssassin headers, so pre-scored messages are fine.

Ian


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: gadgetfan on September 22, 2007, 12:18:12 PM
Excellent.  Thanks for the quick reply.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: gadgetfan on October 16, 2007, 05:29:37 AM
Ian--

I've got a new issue, and I figured I'd mine your SA expertise before submitting a trouble ticket to LP, if that's what's required.  As of this weekend, SA has gone back to rewriting subject lines and for some reason is putting all of the SA scoring information in the X-Spam-Report header instead of at the beginning of the message body as it used to.  I've also seen a marked decrease in the amount of spam I'm seeing in my spam box.  That last part is not a problem, as long as it's only spam that I'm missing, and not legitimate mail.  I have no evidence that I've missed any legitimate mail, but then again, you often don't know that you didn't receive a message :)

I've gone into cpanel, disabled SpamAssassin, disabled the Spam box in SA, re-enabled both, and replaced my SpamAssassin user prefs file with the one recommended in the first message of this thread.  Some spam has started to trickle in (though not at the rate that it had been up until the weekend), but it's still rewriting the subject line.  I've even gone and changed the "rewrite_subject" line in user_prefs to 0 and it's still doing the same thing: rewriting subjects and putting scoring data in the X-Spam-Report header.  It's much easier to refile ham messages back into the inbox when they're attached as an .eml attachment, and that's no longer happening.

In my other thread on this subject (sorry for the repeat post...should have posted in this thread to begin with), Mitch said that LP hasn't made any changes that should affect how SA is running or the amount of spam I'm receiving.

Any thoughts?  Your help would be most appreciated.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on October 16, 2007, 07:07:57 AM
I don't think we can blame LP for this one, I believe it's because of the CPanel upgrade.

Most servers that I personally deal with (I use one, and have several clients I do freelance programming for spread across other servers) all show me the exact same behavior you describe. Our subject rewrites no longer work, they all have "***SPAM***" as a prefix now, regardless of our user_prefs, but this was already explained as something we could not control with the upgrade.

The other one, of course, is the SpamAssassin explanation as to why something was flagged as spam, and using the X-Spam-Status and X-Spam-Report message headers, and the back-end mail filtering seems to ignore the "spam box". Again, I believe this is a change instituted by the whole CPanel/SpamAssassin setup in Cpanel v11. Personally, I'm getting tired of rewriting my code and documentation to all of these changes ;o) It's getting much, much harder to automate the process of taking care of spam.

Personally, I'm just tired of this issue dragging on for so long -- it's been, what, two months now (mid-August!!) -- SpamAssassin *still* stops running every couple of days on my server, and I'll get dozens of spam in my Inbox per day. One of my client ends up with hundreds of spam messages per day when this happens. I've been tempted to watch incoming Emails and send an automated message to support whenever I see that spam scoring is turned off, and requesting it get restarted... but that'd probably get really annoying and LP wouldn't be very happy with me, heh.

I've looked at other third-party filtering but that's just a pain. Unfortunately I have no good answer for you other than to manually move spam messages back into your spam folder manually or set up a mail filter in your own Email client (Outlook, Thunderbird, etc).

I don't think Mitch's response yesterday was accurate -- CPanel 11 has most definitely changed the behavior of SpamAssassin.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: gadgetfan on October 16, 2007, 08:55:45 AM
I've looked at other third-party filtering but that's just a pain. Unfortunately I have no good answer for you other than to manually move spam messages back into your spam folder manually or set up a mail filter in your own Email client (Outlook, Thunderbird, etc).

I was afraid you'd say that :D  No offense intended, your work is much appreciated, but I can't blame you for wanting to give up on this particular project.  I'll disable SA and hope that I start getting spam again (I know, ridiculous :) ) and do client-side filtering.  My bigger concern is that the spam still may not return.  In the other thread (http://www.lunarforums.com/email_with_your_lunarpages_hosting_plan/recent_change_to_spamassassin-t44092.0.html), someone just noted that they've never had SA enabled and they've seen a marked decline in spam over the last few days too...that brings back my concern about legitimate email being lost.

I don't think Mitch's response yesterday was accurate -- CPanel 11 has most definitely changed the behavior of SpamAssassin.

Well, to be fair, I asked if anything had changed in the past few days.  From what I recall, Mitch is well aware that the cpanel changes in August included notable SA implementation changes.

That said, I'm still not sure that he's correct, since the behavior changes for me and this other poster began only a few days ago.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: Mitch on October 16, 2007, 09:07:57 AM
Quote
I don't think Mitch's response yesterday was accurate -- CPanel 11 has most definitely changed the behavior of SpamAssassin.

Yeah, usually consider recent being the last few days and not the last few months. ;)  If you want to double check, support should be able to look in on your specific account and give you a little more information than I can over the forum here though.  :D


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on October 16, 2007, 12:27:59 PM
Oh I know for a fact that Email I have routed through LP has gone missing -- my iandouglas.com MX record points to LunarPages, and I've had a number of clients call me asking why I haven't replied to Emails I never received.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on November 15, 2007, 07:54:13 PM
Hey all, sorry my silence lately. Anyone running the newest version with the flag turned on to check for new versions, I apologize for a script error on my side; I recently moved my iandouglas.com site to a new hosting provider.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: gadgetfan on December 20, 2007, 07:25:58 AM
Hey all, sorry my silence lately. Anyone running the newest version with the flag turned on to check for new versions, I apologize for a script error on my side; I recently moved my iandouglas.com site to a new hosting provider.

Ian - I hadn't run the script in a bit, but this script error seems to be back, just FYI.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: harvestpointe on December 21, 2007, 04:16:14 PM
I get the following error:

Software error:
Could not connect to iandouglas.com to check for a new version of this software.<br />

I hadn't ran the script in awhile.  It use to work but now I get this error.  How do I fix this.

Thanks


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: devilkin on December 22, 2007, 02:16:55 AM
Seems there's a small error in the script: the "re-scan mailboxes" line is missing an " to make the link work. Other than that - thanks :)


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: harvestpointe on December 22, 2007, 06:21:59 AM
Thanks.  Just so everyone is clear line 505 should be

print '<p><a href="/cgi-bin/'.$0.'">re-scan mailboxes</a><br />' ;

Notice the added " prior to the >

Thanks


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on December 23, 2007, 07:36:34 AM
Had an error at iandouglas.com too that was keeping the 'new version check' script from running, sorry 'bout that.

Merry Christmas everyone,
Ian


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on March 02, 2008, 05:40:27 PM
I've added a poll message here in the Email forum (http://www.lunarforums.com/email_with_your_lunarpages_hosting_plan/30_day_poll_new_spamassassin_trainer_option_for_notification-t46716.0.html) about a feature for the next release of the SpamAssassin trainer script. Please check it out and let me know how you'd like to handle Email notification when scanning is completed. I've left the poll open for 30 days, so that should coincide pretty well with an April release of the new script, which will be the same Perl/CGI web-friendly output as it is right now, plus a PHP version for those who hate Perl, plus a Perl version more suitable to automating as a cron job.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: fantasma on March 08, 2008, 05:45:20 AM
Hi Ian,

Went to your website, and read about your tech background. Thanks for taking the time out of your busy schedule to help with Spam Assassin.  I'm a newb at this, but I managed to get the .cgi updated according to your post.  Since my company has a LP business account, I am overseeing all their email accounts as webmaster.  We have yet to accrue 200 ham mail to the globalham account at the moment, which probably accounts for 0 tokens results when I run the program. 

My question really has to do with the SPAM portion of this.  I think I may have missed the information on setting up the myspam folder. How do we train SA to know what SPAM is?  Do I have to set up a new folder somewhere on the server, or configure it in our control panel?  :-? Do my colleagues have to set up a separate SPAM folder in their email clients i.e. Outlook?

Thanks again for your help and great efforts to help the LP community!   :happy:


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on March 08, 2008, 10:07:16 AM
Hi Fantasma,

Yes, you need to configure the script both for a 'ham' folder as well as a 'spam' folder. Ideally, if you call them something like 'scan-ham' and 'scan-spam', the folders will be right next to each other.

When you say the trainer is reporting 0 tokens for ham, do you mean that it learned from 0 messages, or that the reported number at the bottom of the output says 0 ham messages?

It sounds like there might be a configuration problem.

Ian


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: Christo on March 27, 2008, 01:39:49 AM
Hey, what a heck of a nice script!

I am looking at installing it to solve some very annoying spam issues on my domain!!

1. Does anyone know whether it is possible to install the script and have it work without too much hassle, on grafias server ?!

2. Am I the only one that noticed that the command inside the spamassassain area of cpanel to create a spambox, seems NOT to make any new mailbox on the imap folder tree (on grafias) ?!!?!!!?!

3. Why the restriction that you HAVE to run it on the primary domain of the account...??!!  I ask because as it so happens, I use almost EXCLUSIVELY an add-on domain on my account.... the primary domain gets little or no traffic...  Do recent versions of the script let you focus on training SA on an addon domain, rather than on the primary ?!

Great script, looking forward to your replies!!

Christo


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on March 27, 2008, 07:10:14 AM
1. My script is in use on many LunarPages servers as well as other hosting providers who use CPanel, so yes it should work just fine.

2. Flagging CPanel to create the 'spam' folder only allows it to create the folder once a piece of spam has been received. It will not create it for a given account until spam is received at that account.

3. There's an option in the script for add-on domains, but since I don't use add-on domains myself here at LP I haven't thoroughly tested it. If you're willing to help me test it, I'll be happy to help you get up and running. My contact information is at www.iandouglas.com or you can send me a private message here.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: Christo on March 28, 2008, 01:32:15 PM
Follow up questions:

RE: 2. When you say "It will not create it for a given account until spam is received at that account." does this apply to the primary account domain only , or to all of the add-on domains also ?!  Because I have received plenty of spam on the add-on domain, and it has not created the spam folder in there. Neither has it made one in the primary domain account..

3. Yes, let's get it working on my add-on domain, I will help test. I'll send you a pm here.

EDIT: pm not possible?!... I will email you.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on March 28, 2008, 01:49:04 PM
Hi Christo,

If it's not labeling or flagging spam for your add-on domain(s) then you should contact LP support. Usually looking at the raw source of the Email will show headers like this:

Code:
X-Spam-Status: Yes, score=14.9
X-Spam-Score: 149
X-Spam-Bar: ++++++++++++++
X-Spam-Report: Spam detection software, running on the system "janus.lunarpages.com", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
the administrator of that system for details.
Content preview:  No text version was provided Email template Click here to
get enrolled for your Medical Billing Degree! We hope you enjoyed receiving
this email, but if you no longer wish to receive our emails please press
here. or please write to us at: 770 E Main Street #259 Lehi, UT 84043 [...]
Content analysis details:   (14.9 points, 3.5 required)
pts rule name              description
---- ---------------------- --------------------------------------------------
3.5 BAYES_99               BODY: Bayesian spam probability is 99 to 100%
[score: 1.0000]
1.0 FH_XMAIL_RND_833       Special X-Mailer Version
-0.0 SPF_PASS               SPF: sender matches SPF record
2.7 URI_UNSUBSCRIBE        URI: URI contains suspicious unsubscribe link
2.9 URI_L_PHP              URI: URI_L_PHP
1.5 HTML_IMAGE_ONLY_20     BODY: HTML: images with 1600-2000 bytes of words
0.1 HTML_COMMENT_SAVED_URL BODY: HTML message is a saved web page
0.4 HTML_IMAGE_RATIO_02    BODY: HTML has a low ratio of text to image area
0.0 HTML_MESSAGE           BODY: HTML included in message
0.7 MPART_ALT_DIFF         BODY: HTML and text parts are different
1.4 MIME_QP_LONG_LINE      RAW: Quoted-printable line longer than 76 chars
1.5 URIBL_OB_SURBL         Contains an URL listed in the OB SURBL blocklist
[URIs: wetherwarnings.com]
-0.9 AWL                    AWL: From: address is in the auto white-list
X-Spam-Flag: YES

Ian


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: Christo on March 28, 2008, 02:05:08 PM
What a difference, you are getting a ton of info in your headers.

In comparison, all I am getting is these three headers:

X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on   grafias.lunarpages.com
X-Spam-Level: **
X-Spam-Status: No, score=3.0 required=5.0 tests=BAYES_60,URIBL_BLACK   autolearn=no version=3.2.3

Another strange thing is, I NEVER see any emails greater than my 5.0 threshold. I am assuming that they get diverted or deleted.... But I never set it up that way !!


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on March 28, 2008, 02:07:49 PM
Once you run my script you'll see how many spam/ham messages that SpamAssassin has seen since you modified your user_prefs file to use your own bayesian database. Once it's seen 200 spam and ham come in (aside from training) it'll kick into overdrive. In the meantime, training SA on what you consider spam/ham will be that much more beneficial once the magic 200 numbers have been reached.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: paris2 on April 26, 2008, 11:24:08 PM
Can someone help me with this error?

ERROR MESSAGE:

syntax error at tpix-sa-trainer.cgi line 91, near "$check_user_Inbox_for_ham "
BEGIN not safe after errors--compilation aborted at tpix-sa-trainer.cgi line 189.


RELEVANT LINES (I think) FROM SA-TRAINER:

#####
# if you want to scan your users' Inbox folders instead of a separate 'ham'
# folder, set the following line to "Y".
# If you are using the global Email address or $global_hambox variables
# listed above, then THIS variable MUST remain set to "N" -- you cannot scan
# both your user's Inboxes *and* a global Email account/folder for ham.
# Enabling this variable and setting it to 'N' will search for a folder called
# 'ham' within each user account. MOST USERS WILL SET THIS TO "N"
$check_user_Inbox_for_ham = "N" ;
# if the above variable is set to "N", you can enter a mailbox name here to
# scan for non-spam messages; we recomment users create a folder called "ham"
# but you can set that here to some other name
#$user_hambox = "ham" ;


Thanks.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on April 27, 2008, 09:03:02 AM
Perl isn't always that smart when defining errors. I'm guessing the error is actually on line 81 with the $global_hambox variable.

If you like, you can Email a copy of your script to me and I'll check it out. Go to the 'contact me' page at iandouglas.com for my Email addresses.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: Monte on June 23, 2008, 06:43:10 AM
LP support directed me to this thread after a long, continuing battle with excessive spam. I've set up SA trainer as indicated in the top post of this thread and in the text files within the trainer script archive. But... SA seems perfectly content to learn and do nothing with it's new knowledge. According to the script statisitcs at the bottom of the page after it's run, SA has tokens learned from ....

Number of HAM messages scanned over time: 1973
Number of SPAM messages scanned over time: 30617

... yet several of my users still receive 10-30 spam a day. My own account has received a dozen since midnight, none of which have any of the SA header info I expected to see.

Since the majority of my users don't bother to use IMAP or webmail so they can move spam to a scannable folder, I've had to rely on a few stalwarts who have been diligently forwarding spam to a 'spambox' account I set up. I log into that, move the spam to a scannable folder and run the SA trainer. The trainer indicates it has learned new 'tokens' from the scan, and I delete the messages to get ready for the next batch on which to run the trainer. The trainer does work for the few users who have set up a spam folder and move spam messages there manually.

The 'globalham' account is correctly scanned each time as well, with tokens learned at each run. Though traffic there is (as expected) much lower than on the 'spambox' account.

Anyway, let me go back to page 11 and start reading through it all and see if I messed something up. The frustration level with all the spam is exceedingly high...


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: Monte on June 23, 2008, 07:36:23 AM
Quote from: W98
... It's also worth noting that in cases where the spam scoring stops and you get *flooded* with spam -- please do keep moving those spam messages into your 'spam' folders and keep training on them -- SpamAssassin will continue to learn from these messages so when the spam scoring *does* get restarted by LP support, your SpamAssassin databases will be that much better at scoring spam.

That is precisely what appears to be happening for my domain (Felicitas server)... Users report a major influx of spam; I re-activate my long and ugly trouble ticket with support; they re-start SA and all is well. For a couple days. Then the flood starts again. The most recent 'round' began around March 25, 2008 and hasn't improved since then.

Considering the constant flow of spam recently, I'll need to get LP support to jump-start SA on my domain again. At least the trainer script will have provided SA with a ton of new info.

But the underlying problem is still there: Why the heck is the SA service going inactive such a short time after each re-start? And is there something that 'we' -the account holders at LP- can do about it?


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on June 23, 2008, 11:28:17 AM
My $0.02:

SpamAssassin will usually skip scanning large messages ... if I recall, the default is 250kb -- anything larger is not scanned as spam.

If LP has their mail subsystem set up in such a way that if SpamAssassin doesn't respond within a preset amount of time, it skips the scanning process, *AND* a spammer has sent a flood of 249kb messages as spam, that it causes a temporary denial-of-service attack, letting smaller spam messages through.

It's possible, in theory.

It's also theoretically possible that spammers have been rotating their spam tokens according to SpamAssassin's expiration feature -- that is, spam you trained on 4 months ago are now having their tokens expire from your bayesian database, and now the spammers are sending similar messages again which will slip through until you retrain...

Remember, spammers know just as much about SpamAssassin (if not more) than we do -- it's their "business".

As LP customers, I'd encourage everyone to be supportive of the LP admins, be nice about it, but if you do see an increase in spam, simply send in a single support ticket asking them to monitor the systems, that you're getting a lot of 'extra' spam lately.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: Monte on June 23, 2008, 02:46:40 PM
Just looked through the latest batch of spam received today, and not a single one is over 26 Kb. Hmm...

... going back through the last 300+ spam messages received on my account alone (that's going back only as far as May 2nd), not a single one is over 26 kb. That being the case, I don't think it's a size issue with grossly large spam messages bogging down the servers. At least not where my own account is concerned.

I'll admit to having become grumpy with the LP support staff, for which I do apologize. It's not the LP staff sending all this crud through... they just get to deal with both the resource drain and the irate customers. Some days the job probably doesn't pay well enough, I'm sure.

I'm still interested in finding out why the SA service needs to be re-started on the server fairly often (about every three weeks if not more often). SA actually works pretty well when it's running, and even better with the training script in place.

Which reminds me, I need to have them re-start SA on my domain/Felicitas server.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on June 23, 2008, 02:55:17 PM
I can't speak to why they'd need to restart the service unless there's a resource leak of some sort. If it's uniformly happening every 3 weeks, they should recognize a pattern.

Good luck LP admins!


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: the_guv on July 01, 2008, 05:25:50 PM
Hi everyone, and thanks for this great script Ian.

One thing, I've got a few addon domains with LP, and have set up email accounts all to forward to my regular email address...and am training SA from that catch-all.  Is this a good or bad way to deal with addon domains?  I have noticed I can specify addons in the cgi script, but is that really necessary when I have a catch-all?

Many thanks.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: w98 on July 01, 2008, 08:07:45 PM
Since all of your accounts will share the same bayesian database, if the forwarding alias places the message in your catch-all mailbox still showing the original To: recipient, it will be just fine.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: the_guv on July 03, 2008, 01:32:11 AM
many thanks again Ian.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: alican2 on October 28, 2008, 10:15:53 AM
Hi Ian
I am getting 5 email addresses coming continuously to my outlook express 6 account from my web site response form
I have none going to my lunar pages horde account how can I stop the e mails.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: quilthug on April 15, 2009, 10:40:49 AM
Ok so I think I followed the directions step by step. I'm not an expert, but I've been around the block a few.

I put in my address:
http://www.mydomain.com/cgi-bin/sa-trainer.cgi

and I get the following:
sa-trainer.cgi version 3.04 by Ian Douglas, iandouglas.com, Copyright 2004-2007
Some Rights Reserved under a Creative Commons "Attribution Non-commercial" license
Support for this script available here

ERROR: Your base mail folder could not be found. Please configure the $base_mail_folder variable within the script.. Execution cannot continue until this is fixed


any suggestions??


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: jtaylor379 on April 20, 2009, 05:37:36 PM
Hmm, I don't have a cgi-bin folder. Under public_html I do have a _vti_bin folder. Little help here?

Cheers,
Jessica


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: jimlongo on April 30, 2009, 11:45:45 AM
Hi, 2 questions.

1. I've enabled SpamAssassin and enabled the Spam Box, however I thought that this action created the Spam box.  i don't see it anywhere in my server's mail folder.

2.  The User Prefs file that gets downloaded with v3.04 is different than the user prefs included in your instructions.  Which one should I use?

Thanks,
jim


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: angelad on June 22, 2009, 10:20:54 AM
Ok so I think I (https://bodybuilderspro.info/customimages/65/1240197088/random.gif)followed the directions step by step. I'm not an expert, but I've been around the block a few.

I put in my address:
http://www.mydomain.com/cgi-bin/sa-trainer.cgi

and I get the following:
sa-trainer.cgi version 3.04 by Ian Douglas, iandouglas.com, Copyright 2004-2007
Some Rights Reserved under a Creative Commons "Attribution Non-commercial" license
Support for this script available here

ERROR: Your base mail folder could not be found. Please configure the $base_mail_folder variable within the script.. Execution cannot continue until this is fixed


any suggestions??

Mail folder is completely missing in your case?


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: rana9903 on June 22, 2009, 07:43:37 PM
Hello,
I am getting  following message and I am not sure if the script is working:
-----------------------------------------------------------------------------
sa-trainer.cgi version 3.04 by Ian Douglas, iandouglas.com, Copyright 2004-2007
Some Rights Reserved under a Creative Commons "Attribution Non-commercial" license
Support for this script available here

Training SpamAssassin for dunhillbd.com:
Checking /home/username/mail/mydomain.com/user1/.spam/cur/ to learn SPAM:

-----------------------------------------------------------------------------
I does not show anything else

At the bottom Status bar of  Fire ox is showing : Done.
Please    :help: me to make this working ....


My user prefix :
-----------------------------------------
user_prefs
File Type: ASCII text
------------------------------------------
use_bayes 1
bayes_file_mode 0600
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-Information
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_path /home/masterUser/.spamassassin/bayes
required_score 4.0
rewrite_header subject {SPAM _SCORE(0)_}
-------------------------------------------------------

Also deleted the file:(As per previous thread)
 
   bayes_seen   10528 k   0600
   bayes_toks   4704 k   0600

Thanks a lot


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: eluke on July 24, 2009, 08:12:44 AM
I am having problems with my script.  Here are my two problems.  I created the script using the builder.  I followed all of the directions, including changing the permissions to 755 on the script.

Issues

1.       The script runs, but the bayes_toks and bayes_seen files are not created.
2.       When I run the script, I get “Learned tokens from 0 messages."

bayes_file_mode 0600
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-Information
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_path /home/tsadmin/.spamassassin/bayes
required_score 3.5
rewrite_subject 1
subject_tag {SPAM _SCORE(0)-}


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: BadCam on January 16, 2010, 02:04:18 PM
I get the following error:

Software error:
Could not connect to iandouglas.com to check for a new version of this software.<br />

I hadn't ran the script in awhile.  It use to work but now I get this error.  How do I fix this.

Thanks

Now I have this same error. I tried the fix here:

Thanks.  Just so everyone is clear line 505 should be

print '<p><a href="/cgi-bin/'.$0.'">re-scan mailboxes</a><br />' ;

Notice the added " prior to the >

Thanks

and now I'm getting the following errors:

Quote
syntax error at train.cgi line 511, near "if ($mailformat eq 'Maildir' && ( -e "$spambox"
"use" not allowed in expression at train.cgi line 526, at end of line
"use" not allowed in expression at train.cgi line 529, at end of line
Can't find string terminator '"' anywhere before EOF at train.cgi line 531.

Just for some further info, this is the top part of my train.cgi file:

Code:
#!/usr/bin/perl
use CGI::Carp qw(fatalsToBrowser) ;
print "Content-type: text/html\n\n" ;

#####[[[
# sa-trainer.cgi
$version = "3.04" ;
#
# sa-trainer.cgi by Ian Douglas, iandouglas.com, Copyright 2004-2007
# Some Rights Reserved under a Creative Commons "Attribution Non-commercial"
# license, http://creativecommons.org/licenses/by-nc/3.0/
# (you are free to use, copy and modify this code and redistribute it, but
# please do give credit where it's due (to me), and your redistribution must
# NOT be for commercial purposes -- you got it from me for free, do the same
# for others please)
#
# To reach me for support, please contact me via Email at either of the
# following addresses: ian.douglas@iandouglas.com or wild98@gmail.com
#
# This script has always been, and will continue to be, free of charge to
# obtain. If you'd like to show appreciation for the work that's gone into it,
# you're more than welcome to send in a PayPal donation of any amount, however
# you are under NO OBLIGATION whatsoever to donate for my time. ;o)
#]]]
#####[[[ CONFIGURATION
#
# Everything under here should be pretty self-explanatory; you're always
# welcome to visit iandouglas.com if you have any questions.
#
# NOTE TO ADVANCED USERS: if you are specifying mailbox names in the
# configuration and you KNOW your mail storage is Maildir, do NOT include
# the '.' prefix on the mailbox name, the script will insert it for you
# where necessary

#####
# setting this to 'Y' will trigger a callback check to iandouglas.com to make
# sure you are running the latest copy of the script; this is totally optional
# and no personal data is sent from your system -- it merely retrieves the
# latest version number and compares it to this version of the script, and
# notifies you if my copy at iandouglas.com is newer.
# Commenting out this line, deleting it completely, or setting it to anything
# other than a capital 'Y' value will turn off the callback feature.
# THIS IS SAFE TO LEAVE SET TO "Y" UNLESS iandouglas.com IS OFFLINE
$callback_to_iandouglasdotcom = "Y" ;

#####
# your domain name; this is used to find the correct path for your Email
# folders; do not include "www." ALL USERS **MUST** SET THIS VARIABLE.
$my_domain = "XXXX.co.nz" ;

#####
# your CPanel login name; this is also used to find the correct path for
# finding your Email folders, and must be entered exactly the same way as you
# would use it in your FTP program. ALL USERS **MUST** SET THIS VARIABLE.
$cpanel_username = "XXXXXX" ;

#####
# if you know absolutely which mail format your Email is stored in (Mbox or
# Maildir), please uncomment the appropriate line below. If you don't know,
# or your server is subject to change at some point, leave both lines commented
# and the script will attempt to autodetect it for you. MOST USERS WILL NOT
# NEED TO ENABLE EITHER OF THESE LINES.
#$mail_format = "Mbox" ;
#$mail_format = "Maildir" ;

#####
# if your users will forward ham (non-spam) messages to a new Email address to
# scan, please uncomment the following line and enter the username portion of
# the Email address they will forward to; for example, if the mailbox is
# globalham@mydomain.com, just enter "globalham" and nothing more.
# note: this MUST be a mailbox within the domain name you configured above
# as $mydomain. MOST USERS WILL NOT NEED TO ENABLE THIS VARIABLE
$global_ham_email = "globalham" ; # @ mydomain.com

#####
# if you manually collect all ham messages into a global ham folder yourself,
# you should comment out the $global_ham_email variable AND the Inbox_for_ham
# variable listed below, and set this variable to the name of the folder under
# your $cpanel_username mail account where all global ham messages will be kept
# for scanning. MOST USERS WILL NOT NEED TO ENABLE THIS VARIABLE
#$global_hambox = "scan-ham" ;

#####
# if you want to scan your users' Inbox folders instead of a separate 'ham'
# folder, set the following line to "Y".
# If you are using the global Email address or $global_hambox variables
# listed above, then THIS variable MUST remain set to "N" -- you cannot scan
# both your user's Inboxes *and* a global Email account/folder for ham.
# Enabling this variable and setting it to 'N' will search for a folder called
# 'ham' within each user account. MOST USERS WILL SET THIS TO "N"
$check_user_Inbox_for_ham = "N" ;
# if the above variable is set to "N", you can enter a mailbox name here to
# scan for non-spam messages; we recomment users create a folder called "ham"
# but you can set that here to some other name
#$user_hambox = "ham" ;

#####
# scan your individual users' spam boxes; MOST USERS WILL SET THIS TO 'Y'.
# If you collect all of the spam messages into a global spam folder for all
# users as part of your $cpanel_username mail account INSTEAD, then comment
# out this line and set the global spambox setting below.
$check_user_spamboxes_for_spam = "Y" ;
# if you want your users to move their own spam to a new mailbox for scanning
# (useful so users who neglect to move spam don't bog down your script with
# thousands of old spam messages accumulating over time), you can enter a new
# mailbox name here for spam; all users will need to create this folder name,
# and only this folder name will be used for scanning ALL individual spam boxes
$user_spambox = "spam" ;

#####
# global spambox setting; generally this will not be used if your users have
# their own spam folders. If the $check_user_spamboxes_for_spam variable above
# is set to 'Y', this line should be commented out; if you do NOT want to use
# your users' individual spam folders, then set the name of the folder under
# your $cpanel_username mail account where spam will be stored instead. Most
# users will not need to configure this, and will just use the variable above
# ($check_user_spamboxes_for_spam = "Y") instead. MOST USERS WILL NOT NEED TO
# ENABLE THIS VARIABLE
#$global_spambox = "scan-spam" ;

#####
# if you have multiple add-on domain names that you would like to check with
# one execution of this script, uncomment the following line, and enter each
# domain name within quotes inside the parentheses. Note that ALL other
# rules listed above will apply identically to all domains, and that the
# $global_hambox and $global_spambox will apply only to your $my_domain
# domain name only; to explicity *exclude* an add-on domain, you will need to
# add all other add-on domains. MOST USERS WILL NOT NEED TO ENABLE THIS
# VARIABLE.
#@addon_domain_list = ( 'addon-domain-1.com' , 'addon-domain2.com' ) ;

#####
# THE FOLLOWING FEATURE IS NOT YET ENABLED
# Thank you, Paul D., for your idea to add this as a feature!
# if you have an exclusive list of users you want to scan, or an exclusion list
# of usernames who you do NOT want to scan SPAM for, you can enter their full
# Email addresses here. The script will explicitly watch for these users only
# if they are listed without the exclusion marker -- to exclude a user, prefix
# their entry with an exclamation point such as "!excludeme@mydomain.com".
# MOST USERS WILL NOT NEED TO ENABLE THIS.
#####
# NOTE: enabling this list will scan ONLY these accounts and no others, so is
# best used only to enable a few select accounts for spam/ham or to only
# exclude certain users (and scan ALL others).
#####
#@users_to_scan = ( 'john.doe@mydomain.com' , '!excludeme@mydomain.com' ) ;

#####
# if you see an error message about not being able to detect the SpamAssassin
# training application on the server, uncomment the following line and set it
# to the path to where the "sa-learn" application is on your server (you will
# need to ask your hosting provider for this information). This is not usually
# needed. MOST USERS WILL NOT NEED TO ENABLE THIS VARIABLE
#$path_to_salearn = "/usr/bin/sa-learn" ;

#####
# if your CPanel hosting environment stores your Email in a folder other than
# 'mail' within /home/$cpanel_username/ then you should enable this variable
# and specify only the name/path of your root mail folder relative to your
# /home/$cpanel_username/ directory. Be sure to prefix it with a '/', but do
# not add a trailing slash. MOST USERS WILL NOT NEED TO ENABLE THIS VARIABLE.
#$base_mail_folder = "/mail" ; # no trailing '/' please

#####]]] CONFIGURATION IS COMPLETE!

Should I upgrade to V3.5 using the "BuildYour Own" Spam Assassin Trainer here?

http://iandouglas.com/spamassassin-trainer/

Or, should I just fix the errors in the current 3.04 version?

Also, my user_prefs file is as follows:

Code:
use_bayes 1
bayes_file_mode 0600
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-Information
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_path /home/XXXXXX/.spamassassin/bayes
required_score 4.0
rewrite_header subject {SPAM _SCORE(0)_}
score FH_DATE_PAST_20XX 0

But I see that on the first page of this thread it should perhaps now be:

Code:
use_bayes   1
required_hits   3.5
rewrite_subject   1
subject_tag   {SPAM _SCORE(0)_}
bayes_path   /home/xt88002/.spamassassin/bayes
bayes_file_mode   0600
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_ignore_header X-MailScanner-Information

Please note: I have added to my existing user_prefs this line because I understand I need to do this because of the current SA issue:

Code:
score FH_DATE_PAST_20XX 0

Correct?

Anyway. I'm just guessing at all of tis, so any help would be greatly appreciated. Thanks very much in advance.  :yey:


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: kdorsey on January 17, 2010, 03:00:00 PM
I'm going to have to go through this thread and read up.  Been ignoring this for a while, but my spam filters need adjusting, that's for sure.


Title: Re: How-to: Train SpamAssassin - Updated May 30 2007
Post by: Nawtyflier on February 18, 2010, 06:13:29 AM
Hi,
I don't have a hosting plan, just email with Lunarpages.  I'm guessing there's no way to properly configure SpamAssassin since I have nowhere to upload the script.  Am I correct?


Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: w98 on April 27, 2010, 10:12:35 AM
Hey all, quick update.

I've fixed the callback to iandouglas.com to check for a new version. I've also made a change to the first article in this thread to point users to http://iandouglas.com/sa-trainer/ for a do-it-yourself SpamAssassin Trainer Builder app that I wrote. You can essentially just answer a few questions there, and it'll build everything you need.

I've fixed some bugs in the script, and have released v3.06.



Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: lydian on June 17, 2010, 04:04:49 PM
Hmmm - giving up at this point. Everyone says just to give up on SpamAssassin, but I thought I'd give it a last try after being sent a link to this page.
- The promised link to full documentation is nowhere to be found. Does it exist?
- Even in 3.0.6 version of script there are many typos and inconsistencies (global-ham or globalham?)
- Enabling spam box does apparently not create spam folders which are referred to in at least 3 different spellings (SPAM, Spam and spam).

So - after hours of trying and correcting all we get is "cannot scan SPAM". Maybe SpamAssassin should just be laid to rest. Rumors persist that it can be useful, but then it is really tough to find anyone that can actually verify the latter. This software is a complete joke when compared with solutions like postini or anything a client side filter does without reams of (mostly incomplete) documentation. It is time that services like LP address the spam issue which gets worse every year in a serious manner and don't just point to an unintuitive, obsolete tool like SA and tell their customers to take a couple of weeks time to learn how to configure it. 


Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: jojooboo on June 21, 2010, 11:32:18 AM
I'm loving this script!!  I do think there is one error, though.  On the sa-trainer.cgi results page it seems like the "Number of HAM messages scanned over time:" and "Number of SPAM messages scanned over time:" statistics are reversed.  It seems to work as intended, however, with ever-more spam getting filtered into the Spambox.


Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: jojooboo on June 21, 2010, 06:47:27 PM
This software is a complete joke when compared with solutions like postini or anything a client side filter does without reams of (mostly incomplete) documentation. It is time that services like LP address the spam issue which gets worse every year in a serious manner and don't just point to an unintuitive, obsolete tool like SA and tell their customers to take a couple of weeks time to learn how to configure it. 

I could not disagree more.  SpamAssassin has been and continues to be a great tool and with the ability to train it with the script on which this thread is based is gets better and better.  Without SpamAssassin I would get 200+ spam emails a day.  As it stands I get less than 5.  I would not want to deal with that much stuff client-side nor would I want LP to inject their own ideas about how to handle spam emails into the process. 


Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: spatters1000 on July 17, 2010, 08:19:38 AM
Hi -- Got SA running. Seems to be doing its thing okay.

Question on maintenance: I seem to remember that I have to empty some folder of messages that SA captures as spam. When SA determines that a message is spam, what does it do with it? I can't remember if it puts in some folder (hence the above question about emptying it) or if it just deletes it.

On a related question: Is there a way in this forum to search for a particular phrase or word within a specific thread? For example, I tried to find a way to search for "maintenance" within this thread to see if this question had already been addressed, but can't find a way. I tried visually scanning the posts, but gave up on that idea with so many pages of posts.

Thanks!


Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: benjazz on August 07, 2010, 03:24:10 PM
In the cPanel user's IMAP profile, create two new folders called "scan-ham" and "scan-spam"

how and where exactly do I do this step? In cPanel, or on my email client software (Mac OS X Mail)?


Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: Peter Florance on August 11, 2010, 02:34:23 PM
Thanks to some help from Ian, I have the SA Trainer script running and have processed over 200 ham and spam emails.

Q: How do I know SA is using the results? Seems on one of my accounts on my domain,  the spam traffic has dropped drastically. But not my personal account.

Thanks for any insights.



Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: Peter Florance on August 17, 2010, 01:01:50 PM
Thanks to some help from Ian, I have the SA Trainer script running and have processed over 200 ham and spam emails.

Q: How do I know SA is using the results? Seems on one of my accounts on my domain,  the spam traffic has dropped drastically. But not my personal account.

Thanks for any insights.



Do I understand that if SpamAssassin is using BAYES that it will create the spam folder (for that user)  when it first assigned a spam score to an email? I've trained 475 of each and still don't have a spam box.

Help!

Thanks!


Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: crazybass2 on November 01, 2010, 03:33:09 PM
I've been using SpamAssassin for years, but recently found this script!  Everything I've seen looks great!  :happy: I do have a couple questions though.

I used the script builder selecting the option for "low intervension" so that my multiple email accounts (multi-users) can forward spam/ham to a general account.  I've noticed when I do this that the fowarded emails get dumped into the "mailbox-name/new" folder.  When I run the script, it returns only that "mailbox-name/cur" has been checked with 0 results for Spam/Ham.  I took a look at the code (I know a few different languages, but Perl is not my forte) and it appears to only check the "cur" folder and ignores the "new" folder.  How can I modify the code or my cpanel settings to correct this?

Additionally, since I'm having my users forward emails to the spam/ham boxes, is it acceptable to have multiple spam/ham emails forwarded together as opposed to forwarding each individual spam/ham? 

Thanks in advance for response.
Mike



Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: webdiva on August 24, 2011, 04:02:40 PM
I  sent Ian an email and haven't heard back ... is he still active?

I need some help with the spamassin ... I run the script and get this message (where "mydomain" is a substitute for my actual domain)

Autodetected mail storage as Maildir; you could speed up this script slightly if you configure $mail_format in the script to "Maildir"

Training SpamAssassin for mydomain.com:
Checking /home/user/mail/mydomain/user/.scan-spam/cur/ to learn SPAM: Learned tokens from 0 message(s) (1 message(s) examined)
Checking /home/user/mail/mydomain/user/.scan-ham/cur/ to learn HAM: Learned tokens from 0 message(s) (0 message(s) examined)

Am I supposed to be sending everything to the account I set up for all this rather than expecting the script to "learn" from the scan-spam folder in my user folder?

Thanks for help, anyone.


Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: Peter Florance on December 06, 2012, 08:34:43 AM
For some reason, I'm starting to get a lot of spam from the same source. Even though I've had the script examine dozens of them, they still score pretty low (under 3 and sometimes as little as 1)

I'm assuming nothing has changed with SpamAssassin AFA Bayes?



Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: w98 on February 06, 2014, 08:16:25 PM
Hey all, I'm still actively supporting this script nearly a decade later! :thumb:

The script is up to v4.02 and has a nicer layout and some links to documentation and a live support option through Google Helpouts if you're really stuck. There's also a link on the new script's HTML output to contact me directly via Email.

Build your own script: http://iandouglas.com/sa-trainer/
Learn the history and how to configure it yourself: http://iandouglas.com/spamassassin-trainer/

Cheers,
Ian Douglas


Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: dwaynnyt on February 15, 2014, 11:36:22 AM
On top of this, I was also looking to see if there's a video tutorial. Is there one available? (https://Lunarpages.com/cat/6-5/smile2.png)


Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: w98 on February 15, 2014, 03:24:14 PM
No, I've never made a video tutorial on setting it up. I've written up pretty clear instructions, and if the "build your own" script doesn't do the job, you can contact me for personal help.


Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: toyboy on May 30, 2014, 11:33:27 AM
what if we're a small company just trying to deal with business and don't have an IT person, in fact don't have any employees at all, and don't have the time or the inclination to learn scripts and other sorts of computer administrative things, and have tried for months to deal with a growing spam problem? the other day tech support "rebooted" spam assassin and then when that had no effect did it again, and also my email, and the result was getting some 15+ spam in my inbox (where before it had been zero or one or two) in less than a 12 hour period. they claim no coincedance, and throw a bunch of data at me. i'm at wits end.
their main response is to purchase MX, without telling me anything more about it than "it is better" or "it is stronger". the cost isn't that big a deal, but considering the way spam assassin has become gradually inadequate, and considering that tech support simply writes it off as being due to "changes" in technology or people (whatever that means), i'm not inclined to believe that switching to this other software is the solution. OR that it will also become ineffective.

so, for someone that couldn't even get his mind around Fortran back in 1972, and doesn't really want to become a programmer at age 60, i'd appreciate hearing other options to deal with spam. i'm wondering whether my reliance on webmail is part of the problem (it's convenient) and i'd be better off going back to using email software on my own computers.

sorry if i sound like i'm ranting. it's been a ridiculous ride and battle and i'm tired of dealing with both spam and lunarpages support.


Title: Re: How-to: Train SpamAssassin - Updated April 27, 2010
Post by: favcat on October 16, 2014, 12:30:44 PM
I set up this script a couple of months ago and while the script is working fine and processing the spam messages.

Spam Assassin is not getting much smarter.  I am getting the same spam emails over and over and doesn't matter how many times the tokens are looked at these emails are still not getting marked as spam.   

Anyone had problem with spam assassin not learning properly?