Web Hosting Forum | Lunarpages


*
Welcome, Guest. Please login or register.
Did you miss your activation email?



Login with username, password and session length
November 26, 2014, 06:57:35 AM

Pages: 1 2 3 [4] 5 6 ... 19   Go Down
  Print  
Author Topic: How-to: Train SpamAssassin - Updated April 27, 2010  (Read 140946 times)
parish2
Spaceship Navigator
*****
Offline Offline

Posts: 93


« Reply #45 on: December 02, 2004, 01:25:05 PM »

There's a "test filter" window on the filter screen with instructions.
Logged
krick
Intergalactic Cowboy
*****
Offline Offline

Posts: 50


« Reply #46 on: December 03, 2004, 01:59:46 PM »

Quote from: parish2
There's a "test filter" window on the filter screen with instructions.


You know, I never even paid attention to that.  DUH!

Anyway, I'm not sure what the error message means.

What does your filter look like?
Logged
parish2
Spaceship Navigator
*****
Offline Offline

Posts: 93


« Reply #47 on: December 04, 2004, 10:52:38 AM »

header "contains" "jf@tellingpix.com" gets delivered to "jf@tellingpictures.com"
Logged
krick
Intergalactic Cowboy
*****
Offline Offline

Posts: 50


« Reply #48 on: December 05, 2004, 12:26:20 PM »

Quote from: w98


1. Set up IMAP folders to hold spam and ham messages

...

- click on the 'webmail' icon
- click on the 'squirrelmail' link
- click on the 'folders' link



I've gotten up to this point.  However, when I click on the "squirrelmail" link, I get an error page "ERROR  Unknown user or password incorrect"

I'm stuck at this page and cannot get any further.
Logged
krick
Intergalactic Cowboy
*****
Offline Offline

Posts: 50


« Reply #49 on: December 05, 2004, 12:33:03 PM »

Quote from: parish2
header "contains" "jf@tellingpix.com" gets delivered to "jf@tellingpictures.com"


Seems like it should work to me.  I don't understand why it doesn't.

However, you should be aware that I just found a problem with using "contains" instead of "equals".

It seems that when someone sends an email that includes multiple people on the TO line, and all the people are on the same domain, all copies will be caught by the filter.

for example...

given this filter:

TO contains "user1@domain.com" destination "user8@domain.com"

and an email sent with user1@domain.com, user2@domain.com, and user3@domain.com all on the TO line.

Three copies are sent to the domain.com mail server, one for each person on the domain.  But all three copies match the filter and end up going to user8@domain.com
Logged
parish2
Spaceship Navigator
*****
Offline Offline

Posts: 93


« Reply #50 on: December 06, 2004, 03:52:30 AM »

I've subscribed to the IMAP accounts for each of the mailboxes I've set up, as well as the spam mailboxes.  But my account has a generic "inbox" which seems to be full of spam.  Is this mail that was sent to my domain without a specific address?  If so, how do I avoid receiving these?
Logged
TranzNDance
Princess of Naboo
Berserker Poster
*****
Offline Offline

Posts: 11619



WWW
« Reply #51 on: December 06, 2004, 12:34:40 PM »

In your default address settings in cpanel under Mail, you can change it to go to :blackhole:
Logged

Grr..!! Luff Ya Grr..!! Luff Ya Grr..!! Luff Ya
parish2
Spaceship Navigator
*****
Offline Offline

Posts: 93


« Reply #52 on: December 06, 2004, 12:56:11 PM »

That's great, thank you.  Applause
Logged
TranzNDance
Princess of Naboo
Berserker Poster
*****
Offline Offline

Posts: 11619



WWW
« Reply #53 on: December 06, 2004, 12:57:54 PM »

You're welcome. Smile
Logged

Grr..!! Luff Ya Grr..!! Luff Ya Grr..!! Luff Ya
krick
Intergalactic Cowboy
*****
Offline Offline

Posts: 50


« Reply #54 on: July 07, 2005, 10:47:03 AM »

Can someone help me modify the sa-learn.cgi so that it takes the account name as a parameter?

I installed Ilohamail in my account and I'm adding a link in the Ilohamail toolbar that will call the sa-learn.cgi with the user's account name.  That way, any user can move something into their spam folder and then make SpamAssassin learn from it.

The I'm not that familiar with Perl and I'm not sure how to go about modifying the cgi.

Any help will be appreciated.  Thanks
Logged
w98
Galactic Royalty
*****
Offline Offline

Posts: 441



WWW
« Reply #55 on: July 07, 2005, 11:09:20 AM »

Can someone help me modify the sa-learn.cgi so that it takes the account name as a parameter?

Code:
#!/usr/bin/perl
# modified code based on ian's sa-learn.cgi
# freely distributed to anyone, by ian douglas, id@w98.us
# no warranty whatsoever on this, don't blame me for using this

use CGI::Carp qw(fatalsToBrowser);
use CGI ;
my $q = new CGI ;

my $LPaccount = "blah3" ; #replace this with your actual account name

my $basepath = "/home/".$LPaccount ;
my $domain = "mydomain.com" ; # replace this with your actual domain name
my $emaillogin = $q->param('loginname') ;

my $salearn = `which sa-learn` ;
chop($salearn) ;
my $configfile = "$basepath/.spamassassin/user_prefs" ;
$| ;

print "Content-type: text/plain\n\n" ;

if (!$emaillogin) {
  print "You must supply a login id in the 'loginname' parameter!\n" ;
  exit ;
}

print "Learning SPAM:\n" ;
print `$salearn -p $configfile --mbox --spam $basepath/mail/$domain/$emaillogin/myspam` ;
print "\n\n" ;

print "Learning HAM:\n" ;
print `$salearn -p $configfile --mbox --ham $basepath/mail/$domain/$emaillogin/myham` ;
print "\n\n" ;

  if ( -e "$basepath/mail/$domain/$emaillogin/myspam")
  {
    print "Emptying $basepath/mail/$domain/$emaillogin/myspam\n" ;
    open (SPAM, "> $basepath/mail/$domain/$emaillogin/myspam") ;
    print SPAM "" ;
    close SPAM ;
  }
  if ( -e "$basepath/mail/$domain/$emaillogin/myham")
  {
    print "Emptying $basepath/mail/$domain/$emaillogin/myham\n" ;
    open (SPAM, "> $basepath/mail/$domain/$emaillogin/myham") ;
    print SPAM "" ;
    close SPAM ;
  }

I haven't tested this, but it *should* work okay. My perl installation doens't report any syntax problems.

Use:
http://www.yourdomain.com/cgi-bin/sa-learn.cgi?loginname=joesmith

Note that this will require you to have a 'myspam' and 'myham' folder for the 'joesmith' user. Remember that SpamAssassin needs a balance of non-spam (aka 'ham') messages to learn from as well.

Contact me if you need any more help with this.
--
Ian Douglas
Logged

krick
Intergalactic Cowboy
*****
Offline Offline

Posts: 50


« Reply #56 on: July 07, 2005, 11:15:20 AM »


Note that this will require you to have a 'myspam' and 'myham' folder for the 'joesmith' user.


It would be convenient if it could create those folders if they didn't exist.



Remember that SpamAssassin needs a balance of non-spam (aka 'ham') messages to learn from as well.


Really?  It can't learn from just the spam messages?  I've been doing this for a while with just spam (no ham) and it seemed like it was working.  What happens if you don't feed it any ham?

Logged
w98
Galactic Royalty
*****
Offline Offline

Posts: 441



WWW
« Reply #57 on: July 07, 2005, 11:33:17 AM »

It would be convenient if it could create those folders if they didn't exist.

Yes it would ... although that's already covered in the instructions on page 1 and somewhat outside the scope of this script -- this script is to learn from spam/ham folders that *already* exist.

Plus I've never looked to see if LP has the necessary libraries installed to create the folders via IMAP instructions for a given user account.

It can't learn from just the spam messages?  What happens if you don't feed it any ham?

You'll end up with fewer "false positives" if you teach SpamAssassin what you consider HAM to be. If you go to work at a bank, they'll train you on what a *real* dollar bill looks like so you can better detect a counterfeit... SpamAssassin needs a balance so it learns more intelligently.

Also, try not to stockpile spam/ham as the learning algorithm in SpamAssassin can miscount the number of messages in the mailbox it's learning from. It's not uncommon for me to load up a mailbox with 200 messages and have my script report that it scanned 30 messages. I've found that keeping the myham/myspam mailboxes under 60 messages works best (and faster obviously) which means you'll also have more accurate scanning.

The only hitch with training ham/spam from individual Email mailboxes though, is that if one user trains a message as spam, and the next user who also got the same message trains it as ham, it will confuse SpamAssassin since each LP account only gets a single SpamAssassin database. If I recall from old documentation, if SA learns a message as spam, and then relearns the message as ham, it will then classify subsequent messages like it as whatever it learned *last* ... so if all 50 of your users get a message and the first 49 train it as spam and the last user trains it as ham, everyone else's "spam" designation was overridden.

ian
Logged

krick
Intergalactic Cowboy
*****
Offline Offline

Posts: 50


« Reply #58 on: July 07, 2005, 11:43:50 AM »


It can't learn from just the spam messages?  What happens if you don't feed it any ham?

You'll end up with fewer "false positives" if you teach SpamAssassin what you consider HAM to be. If you go to work at a bank, they'll train you on what a *real* dollar bill looks like so you can better detect a counterfeit... SpamAssassin needs a balance so it learns more intelligently.

Also, try not to stockpile spam/ham as the learning algorithm in SpamAssassin can miscount the number of messages in the mailbox it's learning from. It's not uncommon for me to load up a mailbox with 200 messages and have my script report that it scanned 30 messages. I've found that keeping the myham/myspam mailboxes under 60 messages works best (and faster obviously) which means you'll also have more accurate scanning.


Hmm...  Would it hurt to have the script scan the user's inbox as "ham"?  Obviously, I'd have to modify the script so it doesn't empty their inbox, of course.

The way I use it now, I only have one special folder, "spam".  When I get spam, I move it into that folder, then make spam assasin learn from it.  As you say, I need to balance that out with "ham" but the only "ham" I have is the contents of my inbox.  I'd rather not have to move messages into another folder to have it learn, then move them back.

Note that I'm NOT using IMAP.  I use webmail during the day at work and download my email at night using my POP3 client.  I usually have less than 100 messages in my inbox at any given time.  I only train spam assasin during the day at work from my webmail client.

Also note that when I say "user" I'm only talking about 5 people, my family members who are using webmail accounts on my domain.  I doubt my user base will ever grow past 10 people.
Logged
w98
Galactic Royalty
*****
Offline Offline

Posts: 441



WWW
« Reply #59 on: July 07, 2005, 11:48:34 AM »

Hmm...  Would it hurt to have the script scan the user's inbox as "ham"?

... unless they have spam in their inbox, no, it won't hurt anything at all.

In the past, I've made a spam/ham folder setup for each client Email address I hosted, and told them to *copy* any messages that they considered 'ham' into the 'ham' folder and assured them that only my script would ever see what was in there.

Quote
I'd rather not have to move messages into another folder to have it learn, then move them back.

... which is why you just *copy* the messages there Smile

ian
Logged

Pages: 1 2 3 [4] 5 6 ... 19   Go Up
  Print  
 
Jump to: