Web Hosting Forum | Lunarpages
News: November 3, 2008 - Enter Your URL in to WIN the November Site of the Month Award!
 
*
Welcome, Guest. Please login or register.
Did you miss your activation email?
November 20, 2008, 07:42:34 PM


Login with username, password and session length


Pages: 1 ... 13 14 [15] 16 17   Go Down
  Print  
Author Topic: How-to: Train SpamAssassin - Updated May 30 2007  (Read 48496 times)
w98
Galactic Royalty
*****
Offline Offline

Posts: 438



WWW
« Reply #210 on: August 13, 2007, 11:27:56 AM »

Can't get it to work.  I thought I followed the directs, but here's what the script returns.  I paid the $20 for the individual help but didn't get a response to my emails yet, help please.  Thanks!
Well, that's what I get for taking a vacation ;o)

Anyhow Chuck's problem was resolved rather quickly -- the errors about the spam folders were due to LP not filtering his spam messages correctly. And the filtering was working correctly because his user_prefs file was the default file that I bundle in my zip archive that has a place where you MUST insert your proper cpanel username. Without that username being put in the user_prefs file, SpamAssassin will refuse to run, which means LP cannot filter messages properly.
Logged

w98
Galactic Royalty
*****
Offline Offline

Posts: 438



WWW
« Reply #211 on: August 14, 2007, 07:46:14 AM »

So since LP's upgrade for CPanel is taking away our ability to rewrite our own subject lines any longer, it's perfectly safe to use the CPanel interface to configure SpamAssassin again, as it will remove the rewrite_header line for the subject line.

Note that having the rewrite_header line in the user_prefs file will NOT harm it in any way, SpamAssassin will run just fine with or without it.

I'll update my documentation at iandouglas.com and so on later today.
Logged

w98
Galactic Royalty
*****
Offline Offline

Posts: 438



WWW
« Reply #212 on: August 16, 2007, 08:14:52 AM »

By the way, anyone wondering where the last few messages went, Mitch, one of the lead moderators/admins of the forums decided to merge a number of messages into a central place and created a new message thread where we could just discuss the failing SpamAssassin engine issue on various servers.

The first few pages go back a couple of months, but catch up to recent messages soon enough:
http://www.lunarforums.com/index.php/topic,42723.0.html
Logged

Mitch
Lunarpages Traffic Cop
Senior Moderator
Berserker Poster
*****
Offline Offline

Posts: 8421



WWW
« Reply #213 on: August 16, 2007, 08:24:35 AM »

Please use this thread for any of the conversation on or about the recent cPanel 11/Spam Assassin discussion.  Thanks!
Logged

BadCam
Trekkie
**
Offline Offline

Posts: 13


« Reply #214 on: September 05, 2007, 03:06:24 AM »

Hello Ian,

Thank you very much for creating spam assassin.

I've set SA up and the script seems to be working well. When I run the cgi script I get the following message:

sa-trainer.cgi version 3.04 by Ian Douglas, iandouglas.com, Copyright 2004-2007
Some Rights Reserved under a Creative Commons "Attribution Non-commercial" license
Support for this script available here

Autodetected mail storage as Maildir; you could speed up this script slightly if you configure $mail_format in the script to "Maildir"

Training SpamAssassin for mydomain.com:
WARNING: Could not find spambox for admin@mydomain.com, cannot scan SPAM
Checking /home/xt88002/mail/mydomain.com/alan/.spam/cur/ to learn SPAM: Learned tokens from 0 message(s) (0 message(s) examined)
WARNING: Could not find spambox for ana@mydomain.com, cannot scan SPAM
Checking /home/xt88002/mail/mydomain.com/cameron/.spam/cur/ to learn SPAM: Learned tokens from 0 message(s) (0 message(s) examined)
WARNING: Could not find spambox for enquiries@mydomain.com, cannot scan SPAM
WARNING: Could not find spambox for globalham@mydomain.com, cannot scan SPAM

Checking Global Email-based Hambox for HAM messages:
Checking /home/xt88002/mail/mydomain.com/globalham/cur/ to learn HAM: Learned tokens from 0 message(s) (0 message(s) examined)

Number of HAM messages scanned over time: 877
Number of SPAM messages scanned over time: 15404

re-scan mailboxes

(I have replaced my domain name with "mydomain.com")

1) Does this look correct?

2) How do I configure $mail_format in the script?

3) How do I access the Spam folders using Thunderbird? Do I have to set up a Spam IMAP account for each Email account I have? I've set up the global address for spam, but how do I see the spam folders? I don't wish to use webmail.

4) I have received a Spam that SA doesn't recognise as Spam. How do I tell SA that it's Spam? I don't see anywhere in your instructions, or this thread, how to deal with spam that not recognised as spam.

I would just like to add, that the Spam that's coming through doesn't appear to be being checked by SA. Am I incorrect? Is this sufficient (or too much?) infomation for you:

From - Wed Sep 05 21:49:38 2007
X-Account-Key: account4
X-UIDL: UID485-1182998340
X-Mozilla-Status: 0000
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:                                                                                 
Return-path: <jai@alltheunknown.com>
Envelope-to: babbittbuchwald@mydomain.com  <--I have Changed my domain name here
Delivery-date: Wed, 05 Sep 2007 02:43:49 -0700
Received: from 61-225-150-170.dynamic.hinet.net ([61.225.150.170])
   by atlas.lunarpages.com with esmtp (Exim 4.66)
   (envelope-from <jai@alltheunknown.com>)
   id 1ISrOO-0004ah-NU
   for babbittbuchwald@mydomain.com; Wed, 05 Sep 2007 02:43:48 -0700
Received: from [61.225.150.170] by mailstore1.secureserver.net; Wed, 36 Aug 2007 17:42:41 +0800
Message-ID: <0107ffa4$0107fe78$aa96e13d@jai>
From: "Alissa Sadler" <jai@alltheunknown.com>
To: <babbittbuchwald@mydomain.com> <--I have Changed my domain name here
Subject: Save $369.05 adobe acrobat 8 $79
Date: Wed, 36 Aug 2007 17:42:41 +0800
MIME-Version: 1.0
Content-Type: text/plain;
   format=flowed;
   charset="windows-1250"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1106
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
X-Spam-Status: No, score=
X-Spam-Score:
X-Spam-Bar:
X-Spam-Flag: NO


Thank you kindly. This is a lot of fun, by the way. Spam sucks!
« Last Edit: September 05, 2007, 12:31:41 PM by BadCam » Logged

Why is reality always so real?
MikePL
Intergalactic Superstar
*****
Offline Offline

Posts: 133



« Reply #215 on: September 14, 2007, 01:17:39 PM »

After reading all the post here and the long instructions on Ian's site I still have a problem.

I've used this script a few years ago (4 or 5) and I remember having success. Then I simply forgot about the script but with the new wave of spam I had to go back to it again.

My problem is that I just can't set up the script. I read the instructions and can't figure out if it works for me. I don't remember how it worked for me a few years ago either  Confused.

I have a mail program (the bat) and there I have 8 mail accounts from different parked domains (5 domains within my account). I don't use IMAP anywhere. I use POP as I work in photography and keeping all those e-mails with big attachments is pointless. I know I have space, but I have some respect for the LP server's resources, and when I don't need, I don't waste web space. I never use horde or squirrel either.

How should I set up the account, my mail program and maybe other thnigs so that whenever I download spam I am able to put it in a spam folder (and ham in ham folder). Remember that I use POP.

I don't mind creating an IMAP account and create ham and spam folders there, but...
1. How to configure the script to scan the 'ham' folder in, for example, sa@mydomain.com?
2. How to configure the script to scan the 'spam' folder in sa@mydomain.com?
3. Will the script and SpamAssassin handle e-mail copied from my other domains properly? Won't it mark my addresses as spam senders?


I hope you get my idea. The script seems to be for people IMAP and webmail-oriented, while I am POP and absolutely no webmail.  I am tired of writing new e-mail filters. Currently I have 36 filters and I fear that I may loose good messages, as plain brute filtering is not the solution.

Any ideas?
Logged
Mitch
Lunarpages Traffic Cop
Senior Moderator
Berserker Poster
*****
Offline Offline

Posts: 8421



WWW
« Reply #216 on: September 14, 2007, 01:21:58 PM »

Not sure how relative this guide is after the cPanel 11 upgrades that changed the way Spam Assassin works a lot.  So with that said, I'm going to go ahead and lock this thread till we can get some more information on this issue to you.  Thanks!

Alright, we are open for business yet again.  Clapping  Thread unlocked!
« Last Edit: September 17, 2007, 01:13:43 PM by Mitch » Logged

w98
Galactic Royalty
*****
Offline Offline

Posts: 438



WWW
« Reply #217 on: September 18, 2007, 08:17:27 AM »

Hello Ian, Thank you very much for creating spam assassin.

I didn't create SpamAssassin, I just created this HOWTO guide for how to use it.

Training SpamAssassin for mydomain.com:
WARNING: Could not find spambox for admin@mydomain.com, cannot scan SPAM
Checking /home/xt88002/mail/mydomain.com/alan/.spam/cur/ to learn SPAM: Learned tokens from 0 message(s) (0 message(s) examined)
WARNING: Could not find spambox for ana@mydomain.com, cannot scan SPAM
Checking /home/xt88002/mail/mydomain.com/cameron/.spam/cur/ to learn SPAM: Learned tokens from 0 message(s) (0 message(s) examined)
WARNING: Could not find spambox for enquiries@mydomain.com, cannot scan SPAM
WARNING: Could not find spambox for globalham@mydomain.com, cannot scan SPAM

Checking Global Email-based Hambox for HAM messages:
Checking /home/xt88002/mail/mydomain.com/globalham/cur/ to learn HAM: Learned tokens from 0 message(s) (0 message(s) examined)

Number of HAM messages scanned over time: 877
Number of SPAM messages scanned over time: 15404
1) Does this look correct?

Not really, no. It appears that some of your mailboxes don't have a 'spam' folder any more. It seems that your globalham Inbox didn't have any messages in it to scan, either.

It seems that the upgrade to CPanel 11 has removed two things from the older SpamAssassin setup: spam folders, and letting us rewrite our own subject lines to include things like a spam score.


2) How do I configure $mail_format in the script?

This isn't a *requirement*, it just makes the script a few milliseconds faster if you already know what kind of mail format you use. If you edit the script, look for two lines of code that look like
Code:
#$mail_format = "Mbox" ;
#$mail_format = "Maildir" ;
and remove the '#' character at the start of the line that has "Maildir".


3) How do I access the Spam folders using Thunderbird? Do I have to set up a Spam IMAP account for each Email account I have? I've set up the global address for spam, but how do I see the spam folders? I don't wish to use webmail.

If you already use IMAP, you can just subscribe to the existing 'spam' folder for each account, or, if a 'spam' folder doesn't already exist, you'll need to create it for each account.
If you use Thunderbird for POP3, you can add additional IMAP accounts using the same login details as your POP3 setup, and subscribe to the 'spam' folders for each account so you can drag all spam messages to your 'spam' folder.


4) I have received a Spam that SA doesn't recognise as Spam. How do I tell SA that it's Spam? I don't see anywhere in your instructions, or this thread, how to deal with spam that not recognised as spam.

It's actually very well covered in the documentation. A spam message ending up in your Inbox is called a "false negative" -- it was falsely identified as 'not spam'. You'll need to move that message into your 'spam' folder via your IMAP account so that the next time you run the training script, the script will see the message in your spam folder and learn things about it accordingly.

I would just like to add, that the Spam that's coming through doesn't appear to be being checked by SA. Am I incorrect? Is this sufficient (or too much?) infomation for you:

X-Spam-Status: No, score=
X-Spam-Score:
X-Spam-Bar:
X-Spam-Flag: NO

Yes, this is an indication that "spam scoring" has stopped working on your server, and you should contact LunarPages support with that exact mail header, and let them know that spam scoring is not working and could they please restart the process that makes this happen. To my knowledge so far, LP has not yet upgraded the version of the Perl programming interpreter that CPanel 11 requires, and since parts of SpamAssassin use Perl, some parts of SpamAssassin are either not working, or stop working after a period of time, so it's likely that this "spam scoring" process will stop again on its own at some point, so sending a *friendly and courteous* message to LP support asking them to restart the process will be necessary.

It's also worth noting that in cases where the spam scoring stops and you get *flooded* with spam -- please do keep moving those spam messages into your 'spam' folders and keep training on them -- SpamAssassin will continue to learn from these messages so when the spam scoring *does* get restarted by LP support, your SpamAssassin databases will be that much better at scoring spam.
Logged

w98
Galactic Royalty
*****
Offline Offline

Posts: 438



WWW
« Reply #218 on: September 18, 2007, 08:39:12 AM »

My problem is that I just can't set up the script. I read the instructions and can't figure out if it works for me.

The output from the script is a good indication of whether or not it's working -- can you post the output of the script so I can help determine if things are running okay?

How should I set up the account, my mail program and maybe other thnigs so that whenever I download spam I am able to put it in a spam folder (and ham in ham folder). Remember that I use POP.

This is the most common setup that LP likely has -- you as a user set up your Email client to download all incoming mail via POP3 so it's not stored on the server any longer. However, since the SpamAssassin training script runs on the server, any messages you want to train as ham/spam also need to be copied back onto the server. This is why the training script requires that you set up an IMAP account so you can drag messages from your POP3 Inbox back to your IMAP 'spam' folder, and to do the same with moving non-spam messages to your IMAP 'ham' folder. The difference is that after scanning, you would drag the IMAP 'ham' messages back to your Inbox (or wherever you sort them from there), and you would simply delete the spam messages from the IMAP 'spam' folder.

I don't mind creating an IMAP account and create ham and spam folders there, but...


1. How to configure the script to scan the 'ham' folder in, for example, sa@mydomain.com?

Since the default "global ham" mailbox is set for "globalham" you will need to find the line that says
Code:
$global_ham_email = "globalham" ;
and change it to "sa" like this:
Code:
$global_ham_email = "sa" ;
This way, all of your 'ham' (non-spam) messages can simply forwarded to that mailbox by all of your domain users, and they will be trained as non-spam the next time you run your script.


2. How to configure the script to scan the 'spam' folder in sa@mydomain.com?

By default, the script will want to scan individual spam mailboxes. You will need to make the following changes to allow for a different scenario:

- change the line that says
Code:
$check_user_spamboxes_for_spam = "Y" ;
so the "Y" is set to "N", like this:
Code:
$check_user_spamboxes_for_spam = "N" ;

- uncomment the line that says
Code:
#$global_spambox = "scan-spam" ;
so it looks like this:
Code:
$global_spambox = "mydomain.com/sa/spam" ;
and replace "mydomain.com" with your actual domain. This is an undocumented 'feature' of using the script to configure the folder names -- the folder names are actually disk-path relative to /home/yourusername/mail/ so in this case you're going to tell the script to scan /home/yourusername/mail/mydomain.com/sa/spam/cur/ (as a full disk path including the 'cur' portion for Maildir) for spam messages.


3. Will the script and SpamAssassin handle e-mail copied from my other domains properly? Won't it mark my addresses as spam senders?

Using the global spam folder, I add an extra command-line setting to the actual SpamAssassin training program that tells it to ignore the message headers, so it won't mark your users as spammers. So any messages copied to the ham/spam folders of your sa@mydomain.com account won't necessarily be as *accurate* when training ham/spam, but will definitely be better than nothing. Typically, you would *want* to scan the headers of all incoming mail so it can learn tokens about where the message came from, the date, subject line, etc. but using these 'global' settings make it more convenient for you and your users.

The script seems to be for people IMAP and webmail-oriented
My script has the same requirement as the SpamAssassin software itself: that you have messages back on the server in order to train them. The SpamAssassin software on the LunarPages servers has no way to read the mail sitting on your hard drive that you downloaded via POP3. The most convenient way to move messages back to the server is IMAP. There are other ways to move your messages back to the server, but they are WAY more complicated, which is why I tell people to set up IMAP accounts to move the messages back to LunarPages for scanning. (As for webmail, it's simply just a browser-based interface for IMAP.)
Logged

silver45
Intergalactic Cowboy
*****
Offline Offline

Posts: 61



« Reply #219 on: September 18, 2007, 12:08:30 PM »

This is the most common setup that LP likely has -- you as a user set up your Email client to download all incoming mail via POP3 so it's not stored on the server any longer. However, since the SpamAssassin training script runs on the server, any messages you want to train as ham/spam also need to be copied back onto the server. This is why the training script requires that you set up an IMAP account so you can drag messages from your POP3 Inbox back to your IMAP 'spam' folder, and to do the same with moving non-spam messages to your IMAP 'ham' folder. The difference is that after scanning, you would drag the IMAP 'ham' messages back to your Inbox (or wherever you sort them from there), and you would simply delete the spam messages from the IMAP 'spam' folder.
Just another POV, I tried going to IMAP and very much didn't like it, so what I do is I use the Horde interface to copy good mail from the inbox to the HAM folder, or move any SPAM in the inbox to the SPAM folder, then use POP to download the stuff left in the inbox. Since everything in HAM is a copy of mail I've already downloaded, I can just empty the box after running the trainer.

It may be extra work, but for me it works better than trying to get my Eudora setup to work the way I want it to with IMAP.

Like I said, just another way of accomplishing the same end result, without needing to deal with IMAP Smile.
Logged
w98
Galactic Royalty
*****
Offline Offline

Posts: 438



WWW
« Reply #220 on: September 18, 2007, 01:04:45 PM »

just another way of accomplishing the same end result, without needing to deal with IMAP Smile.

Aside from the fact that Horde is just a web interface for IMAP, yes Smile

But yes, I think that *copying* non-spam to your 'ham' folder is the best way to handle training ham. The other way would be to *move* it to your 'ham' folder, run the training script, then move it all back. But if you *copy* to your 'ham' folder, you can just delete the messages in the 'ham' folder when you're done. The other scenario (move, train, move back) I feel is riskier because there's a chance you forget a needed message and delete it by accident.

Personally, I *move* non-spam messages that I don't need to keep (newsletters, notices, mailing list traffic that doesn't interest me, Email notices that someone replied to this message thread, for example) into my 'ham' folder, and only *copy* over non-spam messages that I *need* to keep copies of (messages for work/contracts, notes from my wife, etc)

Spam *always* gets moved to the 'spam' folder -- no need to keep it around ever.

But personally, I only use IMAP for my Email 'cause I figure that (a) LP has the storage, (b) keeps backups, and (c) since I triple-boot into Gentoo/Ubuntu/Windows frequently throughout the day, I can still access all of my Emails no matter which OS I'm running, or even have access to all of those messages while on the road.

Though, given the CPanel 11 fiasco, I'm still tempted to just set up a Linux box here at home to be my mail server for all of my domains and just download everything here at home, run my own SpamAssassin filters, etc., and not have to rely on any third-party for my Email ... some days I really miss running my own hosting business. Smile
Logged

gadgetfan
Intergalactic Cowboy
*****
Offline Offline

Posts: 54


« Reply #221 on: September 22, 2007, 01:01:20 PM »

Thanks, w98, I've just started using SA and your script, and so far, so good.  One question, though.  I'm set up to use the global ham mailbox, and it's not clear to me if you intend for users to forward the entire SA-edited message (complete with scoring information and such) to the global ham box, or just the original message.  I presumed the latter, but I figured I should check.

Thanks.
Logged
w98
Galactic Royalty
*****
Offline Offline

Posts: 438



WWW
« Reply #222 on: September 22, 2007, 01:10:01 PM »

Hi gadgetfan,

Your users can just forward their incoming non-spam messages as-is, either as an attachment or as an inline message, SpamAssassin will learn enough tokens to remember what kind of mail you like to get. It's also configured, via my user_prefs suggestions, to ignore other SpamAssassin headers, so pre-scored messages are fine.

Ian
Logged

gadgetfan
Intergalactic Cowboy
*****
Offline Offline

Posts: 54


« Reply #223 on: September 22, 2007, 01:18:12 PM »

Excellent.  Thanks for the quick reply.
Logged
gadgetfan
Intergalactic Cowboy
*****
Offline Offline

Posts: 54


« Reply #224 on: October 16, 2007, 06:29:37 AM »

Ian--

I've got a new issue, and I figured I'd mine your SA expertise before submitting a trouble ticket to LP, if that's what's required.  As of this weekend, SA has gone back to rewriting subject lines and for some reason is putting all of the SA scoring information in the X-Spam-Report header instead of at the beginning of the message body as it used to.  I've also seen a marked decrease in the amount of spam I'm seeing in my spam box.  That last part is not a problem, as long as it's only spam that I'm missing, and not legitimate mail.  I have no evidence that I've missed any legitimate mail, but then again, you often don't know that you didn't receive a message Smile

I've gone into cpanel, disabled SpamAssassin, disabled the Spam box in SA, re-enabled both, and replaced my SpamAssassin user prefs file with the one recommended in the first message of this thread.  Some spam has started to trickle in (though not at the rate that it had been up until the weekend), but it's still rewriting the subject line.  I've even gone and changed the "rewrite_subject" line in user_prefs to 0 and it's still doing the same thing: rewriting subjects and putting scoring data in the X-Spam-Report header.  It's much easier to refile ham messages back into the inbox when they're attached as an .eml attachment, and that's no longer happening.

In my other thread on this subject (sorry for the repeat post...should have posted in this thread to begin with), Mitch said that LP hasn't made any changes that should affect how SA is running or the amount of spam I'm receiving.

Any thoughts?  Your help would be most appreciated.
Logged
Pages: 1 ... 13 14 [15] 16 17   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.6 | SMF © 2006-2008, Simple Machines LLC

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM