Suggestions for off-site storage.

Suggestions for off-site storage.
January 12, 2011, 08:38:59 AM
First a little history of my experience:

After Lunarpages had been hosting my site since early 2006, I unceremoniously got suspended for excessive usage of memory on alnitac.  To keep this short, we never really found the problem, since it seems to have magically disappeared.  However, one action I was advised to take was to move about 50MB of publically downloadable files, using 12GB/month of bandwidth, off-site to something like RapidShare.  Not particularly liking the sleaz factor of such services, I decided to sign-up for the Amazon S3 (Simple Storage Service) and move my 50MB of .EXE files there.  This was actually for the best since Amazon keeps these files at multiple data centers worldwide anyway.  The monthly cost is nothing, or next to nothing, at this usage level.  You pay for what you use over the extraordinarily high Free Tier limit.  See http://aws.amazon.com/s3/#pricing .

Amazon S3 has one major drawback.  All those really nice Webalizer Statistics, provided by Lunarpages, are obviously no longer available for files I moved off-site.  Worse yet, Amazon doesn't produce nice Apache Combined Format log files that can be downloaded and analyzed.  Even worse, Amazon doesn't produce a single log file, but hundreds of smaller files per day. The CloudBerry Explorer http://cloudberrylab.com/?page=cloudberry-explorer-amazon-s3 (highly recommended)  has the ability to parse these logs and present them in either a spreadsheet format or one of two rudimentary bar graphs.  You could, as I did, download hundreds or thousands of the Amazon log files, concatenate them together with the DOS "Copy" command, use something like the PFE Text Editor to make a pure Microsoft text file out of it (since some lines end in a <Cr> only).  Then write an Excel macro to swap columns around and put back Quotes that Excel strips, and then analyze the resultant file.  Didn't work too bad, but what a pain, and I've seen mention, in some blogs, that some lines may be repeated in more than one Amazon log file.  Yuk.

For me, the answer was an online service called S3Stat http://www.s3stat.com/ .  Just sign-up, give them a way to access your AWS account and your "Bucket" name (the directory you want logged).  That's it.  Every day, you'll find the Webalizer logs accessible from your web browser.  You'll also find .zip files, containing correctly converted to Apache Combined Format access logs, in the same folder as Amazon's gaggle of proprietary log files.  I also like Weblog Expert Lite, so I download them and use that program, in addition to Webalizer, on occasion.  Just because I can I guess. :-)

There are 2 points you may be concerned about.  1) I didn't want Search Engines spidering my Webalizer logs, so I put a "Disallow:" in a robots.txt file, at the top (Bucket) level, and the "/stats" folder where S3Stat stores your browser readable analysis.  2) You may not like giving S3Stat their own set of credentials to access your account.  This didn't bother me since all the files are publically accessible downloadable files anyway, but if this is a concern, you may setup your Amazon S3 folders yourself, in the fashion S3Stat needs them, and only allow them access to the log files, they need to read, and the folder to put the results in.  See http://www.s3stat.com/web-stats/self-managed.ashx .

Only one statistic bothers me still.  I never could find out, from Webalizer, if "hits" on a downloadable .exe file is the same as a completed download or not.  Or, does that count all attempts weather they fail or not.  If anyone knows the answer I'm all ears.

I hope some person/people using shared hosting and having problems (real or imagined) with Lunarpages resource usage, can benefit from some of this information.



