Web Hosting Forum | Lunarpages

Author Topic: How-to: Train SpamAssassin - Updated April 27, 2010  (Read 189798 times)

Offline krick

  • Intergalactic Cowboy
  • *****
  • Posts: 50
Re: How-to: Train SpamAssassin
« Reply #60 on: July 07, 2005, 12:19:27 PM »
It would be convenient if it could create those folders if they didn't exist.

Yes it would ... although that's already covered in the instructions on page 1 and somewhat outside the scope of this script -- this script is to learn from spam/ham folders that *already* exist.

Plus I've never looked to see if LP has the necessary libraries installed to create the folders via IMAP instructions for a given user account.


I just removed the folder check...

if ( -e "$basepath/mail/$domain/$emaillogin/myspam")

...and it creates the folder if it doesn't exist.


Thanks for all your help.  It all works like a charm now.  You da man!    :thumb:

Offline w98

  • Galactic Royalty
  • *****
  • Posts: 443
    • http://iandouglas.com
Re: How-to: Train SpamAssassin
« Reply #61 on: July 14, 2005, 11:56:02 AM »
Update July 14, 2005:

I've modified the script back on page 1 quite a bit and updated the notes a little.

The new script will scan your /mail/ directory and recursively scan each mailbox in there to detect ham/spam mailboxes and delete any messages in your ham/spam folder pair after scanning.

Therefore, running:
http://www.yourdomain.com/cgi-bin/sa-learn.cgi
will now scan multiple mailboxes on the fly, and erase the messages in ham/spam after scanning.

The URL has three possible parameters:
account
xham
xspam

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?account=joesmith
When 'account' is added to the URL line, the script will only scan joesmith's mailbox for ham/spam folders; the absence of the 'account' parameter will scan ALL mailboxes on the domain by default

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?xham=0
By default, xham gets internally set to '1' if not set on the URL, which flags the script to empty the ham folders of any messages after scanning. By setting this flag to a 0, you can ensure that the ham messages will remain intact after scanning.

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?xspam=0
Likewise. xspam gets internally set to '1' if not set on the URL, which flags the script to empty the spam folders of any messages after scanning. I can't imagine too many scenarios where you'd want to set this to a 0, but provide for it just in case.

Here's an example of scanning joe smith's mailbox and keeping his ham:
http://www.yourdomain.com/cgi-bin/sa-learn.cgi?account=joesmith&xham=0

Feel free to comment, make improvements, or report bugs by replying here in the forums.


Offline agesixracer

  • Newbie
  • *
  • Posts: 1
Re: How-to: Train SpamAssassin
« Reply #62 on: August 16, 2005, 10:32:35 AM »
Update July 14, 2005:

I've modified the script back on page 1 quite a bit and updated the notes a little.

The new script will scan your /mail/ directory and recursively scan each mailbox in there to detect ham/spam mailboxes and delete any messages in your ham/spam folder pair after scanning.

Therefore, running:
http://www.yourdomain.com/cgi-bin/sa-learn.cgi
will now scan multiple mailboxes on the fly, and erase the messages in ham/spam after scanning.

The URL has three possible parameters:
account
xham
xspam

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?account=joesmith
When 'account' is added to the URL line, the script will only scan joesmith's mailbox for ham/spam folders; the absence of the 'account' parameter will scan ALL mailboxes on the domain by default

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?xham=0
By default, xham gets internally set to '1' if not set on the URL, which flags the script to empty the ham folders of any messages after scanning. By setting this flag to a 0, you can ensure that the ham messages will remain intact after scanning.

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?xspam=0
Likewise. xspam gets internally set to '1' if not set on the URL, which flags the script to empty the spam folders of any messages after scanning. I can't imagine too many scenarios where you'd want to set this to a 0, but provide for it just in case.

Here's an example of scanning joe smith's mailbox and keeping his ham:
http://www.yourdomain.com/cgi-bin/sa-learn.cgi?account=joesmith&xham=0

Feel free to comment, make improvements, or report bugs by replying here in the forums.



when i run the sa-learn.cgi file it doesn't seem to be checking each mailbox.  instead of getting

Code: [Select]
SpamAssassin version 3.0.4
using /usr/bin/sa-learn in /home/lpaccount/mail/mydomain.com (login: all) to learn about spam/ham
Checking /home/lpaccount/mail/mydomain.com/jim.smith for spam/ham:
Learning SPAM:
Learned from 14 message(s) (18 message(s) examined).
Learning HAM:
Learned from 9 message(s) (5 message(s) examined).


Checking /home/lpaccount/mail/mydomain.com/john.doe for spam/ham:
Learning SPAM:
Learned from 23 message(s) (29 message(s) examined).
Learning HAM:
Learned from 35 message(s) (48 message(s) examined).



i get:



Code: [Select]
SpamAssassin version 3.0.3
using /usr/bin/sa-learn in /home/lpaccount/mail/mydomain.com (login: all) to learn about spam/ham
Checking /home/lpaccount/mail/mydomain.com/all for spam/myham:
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myham
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myspam


Checking /home/inside16/mail/mydomain.com/all for spam/myham:
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myham
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myspam
« Last Edit: August 16, 2005, 10:51:53 AM by agesixracer »

Offline w98

  • Galactic Royalty
  • *****
  • Posts: 443
    • http://iandouglas.com
Re: How-to: Train SpamAssassin
« Reply #63 on: August 16, 2005, 10:58:10 AM »
when i run the sa-learn.cgi file it doesn't seem to be checking each mailbox.
i get:

Code: [Select]
SpamAssassin version 3.0.3
using /usr/bin/sa-learn in /home/lpaccount/mail/mydomain.com (login: all) to learn about spam/ham
Checking /home/inside16/mail/mydomain.com/all for spam/myham:
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myham
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myspam


Checking /home/inside16/mail/mydomain.com/all for spam/myham:
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myham
Emptying/Creating /home/lpaccount/mail/mydomain.com/all/myspam

How exactly are you running the script? Copy the full URL into a message here, masking out your domain name, like this:

   http://www.********.com/cgi-bin/sa-learn.cgi ...

and be sure to list all of the parameters you're passing. Obviously without seeing the exact code of the script you've uploaded to your server, I cannot guarantee that this will operate as you expect it to.

The script defaults to scanning all mailboxes by looking at the $emaillogin variable which is initialized with the word "all". If you've specified an "account" parameter when running the script (ie: http://blah.com/cgi-bin/sa-learn.cgi?account=joeuser) then $emaillogin contains "joeuser" instead of "all" (it gets overwritten.

Later in the code, there's a logic check to see if $emaillogin equals "all" or not. If not, it calls the "dospam" routine on the single mailbox. Otherwise, it goes through other code to detect every mailbox on the domain, and scan each of them for ham/spam.

This script works just fine for me on 4 different sites; the only modification is the name of the domain and LP username. If you want me to look at it for you, contact me in a private message and I'll be happy to take a peek at it.

Offline w98

  • Galactic Royalty
  • *****
  • Posts: 443
    • http://iandouglas.com
Re: How-to: Train SpamAssassin
« Reply #64 on: August 16, 2005, 12:11:09 PM »
Okay, we found a bug.

I made a modification to the script back on page one to alter the code from this:
Code: [Select]
foreach my $login (sort @logins) {
  my $newpath = "$basepath/$emaillogin" ;
  &dospam($newpath,$config,$clearham,$clearspam) ;
}
to this:
Code: [Select]
foreach my $login (sort @logins) {
  my $newpath = "$basepath/$login" ;
  &dospam($newpath,$config,$clearham,$clearspam) ;
}

The $newpath variable was getting loaded up with $emaillogin, which in the case of dealing with "all" mailboxes, was putting the word "all" in the $newpath, instead of the $login variable parsed from the foreach() call. Sorry about that.

Offline purefusion

  • Discovered
  • Trekkie
  • **
  • Posts: 11
Re: How-to: Train SpamAssassin
« Reply #65 on: September 06, 2005, 06:07:02 AM »
By default, I am missing my bayes_toks and bayes_seen files. If I just create them, will everything work as normal?

As of now, when scanning myham/myspam, it claims no files are being learned from, and no messages examined, though there are clearly spam messages in the myspam folder that got past the spam filter.

:(

Offline krick

  • Intergalactic Cowboy
  • *****
  • Posts: 50
Re: How-to: Train SpamAssassin
« Reply #66 on: September 29, 2005, 10:02:37 AM »
Update July 14, 2005:

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?account=joesmith
When 'account' is added to the URL line, the script will only scan joesmith's mailbox for ham/spam folders; the absence of the 'account' parameter will scan ALL mailboxes on the domain by default

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?xham=0
By default, xham gets internally set to '1' if not set on the URL, which flags the script to empty the ham folders of any messages after scanning. By setting this flag to a 0, you can ensure that the ham messages will remain intact after scanning.

http://www.yourdomain.com/cgi-bin/sa-learn.cgi?xspam=0
Likewise. xspam gets internally set to '1' if not set on the URL, which flags the script to empty the spam folders of any messages after scanning. I can't imagine too many scenarios where you'd want to set this to a 0, but provide for it just in case.

Feel free to comment, make improvements, or report bugs by replying here in the forums.

I think it should always default to *not* deleting anything unless explicitly instructed to for safety reasons.

"account" should have to be set to "all" in the URL in order to check all folders and check no folders by default.

then have two parameters... "deletespam" and "deleteham" that have to be set to 1 or "true" in the URL to delete files (defaults to 0 or "false" if not present)

The reason is that if while setting things up, someone screws up while configuring their paths somewhere, a lot of stuff could get deleted and this would be very bad.

Just my $.02



Incidentally, I'm still using the earlier version of the sa-learn-user.cgi where it only takes a "user" parameter.  I modifed ilohamail to include a "learn-spam" link at the top.  That way, when I'm logged into my webmail, I can select spam messages, move them to the spam folder, then hit "learn spam".  This learns the inbox as the "ham" folder, learns the spam folder, and deletes the contents of the spam folder.

Using this method, each user can learn their own spam and ham and train SpamAssassin.

If anybody is interested, here's the code I added to Ilohamail 0.8.14 RC3 in
/source/tool.php  .....

Code: [Select]
$links[] = array("prefs.php?user=$user", "list2", $toolStrings["prefs"], $div);

// BEGIN ADDED CODE
$LPtemp = explode("@", $loginID);
$LPuser = $LPtemp[0];
$links[] = array("http://domain.com/cgi-bin/sa-learn-user.cgi?user=$LPuser",
"_blank", "Learn Spam", $div);
// END ADDED CODE

echo "\n<form method=POST action=\"main.php\" target=\"list2\">\n";


It works good but I have two little problems:

1) I have to hard code my domain into the URL.  I'm not sure how to get my domain or my IP address at that spot via PHP.

2) the way I get the user name is kinda hacky.  There must be a better way.
« Last Edit: September 29, 2005, 12:08:13 PM by krick »

Offline w98

  • Galactic Royalty
  • *****
  • Posts: 443
    • http://iandouglas.com
Re: How-to: Train SpamAssassin
« Reply #67 on: October 13, 2005, 03:14:03 PM »
You could use $_SERVER['http_host'] to get your domain name in PHP.

Offline bxb13

  • Newbie
  • *
  • Posts: 1
Re: How-to: Train SpamAssassin
« Reply #68 on: October 23, 2005, 02:22:39 PM »
Hi. I'm a newbie at LP, but I just set up IMAP email w/ Spam Assassin. I also used this guide to set up the CGI script to train SA. I'm running into a problem, though, where SA training script says it only looked a 1 message and learned from 0 messages. This happens for both the SPAM and HAM folders, and it makes no difference how many messages got copied to these folders. I tried both the opriginal script on the 1st page, and the w98 modified version, but I get the same results. Are there any logs that I can look to see if the script is running into problems? Or is there any other way to debug the script?

Any help would be greatly appreciated.
Thanks
« Last Edit: October 24, 2005, 01:51:58 PM by bxb13 »

Offline krick

  • Intergalactic Cowboy
  • *****
  • Posts: 50
Re: How-to: Train SpamAssassin
« Reply #69 on: October 24, 2005, 10:29:32 AM »
You could use $_SERVER['http_host'] to get your domain name in PHP.

Actually, I ended up using $SERVER_ADDR to get the ip address and it worked like a charm.


Offline spatters1000

  • Space Explorer
  • ***
  • Posts: 6
Re: How-to: Train SpamAssassin
« Reply #70 on: October 31, 2005, 01:15:50 PM »
Hi. I'm a newbie at LP, but I just set up IMAP email w/ Spam Assassin. I also used this guide to set up the CGI script to train SA. I'm running into a problem, though, where SA training script says it only looked a 1 message and learned from 0 messages. This happens for both the SPAM and HAM folders, and it makes no difference how many messages got copied to these folders. I tried both the original script on the 1st page, and the w98 modified version, but I get the same results. Are there any logs that I can look to see if the script is running into problems? Or is there any other way to debug the script?

Any help would be greatly appreciated.
Thanks

I'm a newbie too and I just set up SA and the training script. I had a similar problem until I changed the permissions on the new sa-learn.cgi file. The instructions on page one of this thread state to change the permission to 755. I tried typing in 755 but it wouldn't take it. I finally realized that you need to check the three "execute" boxes and it changes the values for you. To do this, hightlight the sa-learn.cgi file, click Change Permissions. (Initially, the permission value is 644. When you check the three "execute" boxes the value changes to 755. Click "Change" to save the new value. After I did this it worked. I realized the executable permission was the problem when I checked the error log. Access it via cPanel. There's an icon labeled "Error Log".

Another quirk I see is when I run the cgi script (http:www.domain.com/cgi-bin/sa-learn.cgi). If I open a browser window, enter the URL for the script, run it, make a change to the number of email messages in the spam/ham folder (for example), and then try to run the script again, it only repeats the values shown in the first run. This was also the case when I was getting no results before I changed the permission value. I found that if I close the browser window and then open a new one and run the script it will reflect the updated values. I couldn't get it to update if I merely refreshed the page, or if I clicked "Go" on the address bar.

One other puzzling thing -- I set up an IMAP account in Outlook. I can see Inbox, Inbox.Drafts, Inbox.Sent, Inbox.Trash, and Junk E-mail. I don't see the "myham" and "myspam" folders, even though they exist. I can see them when I access the account via Webmail. Also, I'm assuming Outlook created the Junk Email folder since that one doesn't exist in the Webmail view.

Anyone have thoughts/suggestions on the Outlook issue? Thanks!

Offline grof

  • Newbie
  • *
  • Posts: 3
Re: How-to: Train SpamAssassin
« Reply #71 on: November 05, 2005, 04:21:28 PM »
This is a great resource for SA !  Thanks W98.

One question though - When moving email to the myspam folder, do you:
a) feed it all emails that are spam (including those already correctly marked as spam by SA), or
b) only feed it emails that SA did not mark as spam ?

If the preferred option is a), is there any reason why you couldn't use a mail filter to automatically move the marked spam to the myspam folder (and then only have to manually move the un-marked spam) ?

Offline bomalley

  • Space Explorer
  • ***
  • Posts: 6
    • http://www.vutec.com
Re: How-to: Train SpamAssassin
« Reply #72 on: December 05, 2005, 06:31:00 AM »
I've been using this script for a few days now and no matter how many emails are in myspam or myham I get the following response.

Checking /home/XXX/mail/XXX.com/USER for spam/myham:
Learning SPAM:
Learned tokens from 0 message(s) (1 message(s) examined)
Learning HAM:
Learned tokens from 0 message(s) (1 message(s) examined)
Emptying/Creating /home/XXX/mail/XXX.com/USER/myham
Emptying/Creating /home/XXX/mail/XXX.com/USER/myspam

Offline labyrinthian

  • Newbie
  • *
  • Posts: 4
Re: How-to: Train SpamAssassin
« Reply #73 on: December 13, 2005, 07:49:53 PM »
Ditto here - only getting one message examined as well here for me.  Same scenario.

Offline purefusion

  • Discovered
  • Trekkie
  • **
  • Posts: 11
Re: How-to: Train SpamAssassin
« Reply #74 on: January 23, 2006, 07:46:54 AM »
To you who are only getting it to pick up 1 message in the folder, make sure the user prefs file contains:

required_hits   5
rewrite_subject 1
subject_tag {SPAM}
bayes_path /home/lpaccount/.spamassassin/bayes
bayes_file_mode 0600
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_ignore_header X-MailScanner-Information

Most importantly:

bayes_path /home/lpaccount/.spamassassin/bayes
(where lpaccount is your username!)

Chances are, you had a typo in this line somewhere if it is already in the user_prefs file. Search the line high and low for spelling errors, typos, and dyslexia effects)

Also, make sure you change the permissions, so that the .spamassassin folder, and all files within are chmodded to 777. Voila! Problems solved... I hope!

 

Share |