Thanks for this topic, very helpful.
I've been having 2 problems with Webalizer, maybe LP staff or customers can help!
1) I am running Webalizer on my PC computer. When I run it on the raw uncompressed log file, I get many error messages:
[new_snode] Warning: String exceeds storage size.
For July, out of 3,630,643 records, 107 were ignored, 3 were bad, so it's a very small percentage. Still, I wonder about this. PC limitation?
2) I can't seem to get webalizer.conf configured to screen out referrers that are from my own site, which makes it a hassle to try and analyze results. Here's my configuration for hiding, I've played with it, evidently I don't grasp the matching rules.
216.12.92.18 is our local network address, I don't want to include stats from me hitting my own website. mail.coreknowledge.org is our internal mail server.
around line 376:
# The value can have either a leading or trailing '*' wildcard
# character. If no wildcard is found, a match can occur anywhere
# in the string. Given a string "www.yourmama.com", the values "your",
# "*mama.com" and "www.your*" will all match.
# Your own site should be hidden
HideSite coreknowledge.org*
HideSite 216.12.92.18
#HideSite localhost
# Your own site gives most referrals
HideReferrer *coreknowledge.org
HideReferrer Direct Request
and later, around line 476:
# The Ignore* keywords allow you to completely ignore log records based
# on hostname, URL, user agent, referrer or username. I hessitated in
# adding these, since the Webalizer was designed to generate _accurate_
# statistics about a web servers performance. By choosing to ignore
# records, the accuracy of reports become skewed, negating why I wrote
# this program in the first place. However, due to popular demand, here
# they are. Use the same as the Hide* keywords, where the value can have
# a leading or trailing wildcard '*'. Use at your own risk ;)
#IgnoreSite bad.site.net
#IgnoreURL /test*
IgnoreReferrer *coreknowledge.org
IgnoreReferrer 216.12.92.18
#IgnoreAgent RealPlayer
#IgnoreUser root
# The Include* keywords allow you to force the inclusion of log records
# based on hostname, URL, user agent, referrer or username. They take
# precidence over the Ignore* keywords. Note: Using Ignore/Include
# combinations to selectivly process parts of a web site is _extremely
# inefficent_!!! Avoid doing so if possible (ie: grep the records to a
# seperate file if you really want that kind of report).
# Example: Only show stats on Joe User's pages...
IgnoreURL 216.12.92.18
IgnoreURL mail.coreknowledge.org
#IncludeURL ~joeuser*
# Or based on an authenticated username
IgnoreUser 216.12.92.18
#IncludeUser someuser
Thanks for any guidance to help me get this sorted out!
---Diana