Web Hosting Forum | Lunarpages


*
Welcome, Guest. Please login or register.
Did you miss your activation email?



Login with username, password and session length
September 16, 2014, 03:34:11 PM

Pages: [1]   Go Down
  Print  
Author Topic: Resource Usage  (Read 1932 times)
wkomorniczak
Space Explorer
***
Offline Offline

Posts: 7


« on: April 02, 2013, 12:04:42 PM »

Hi Community!

For about 2 months now I have been faced with too high server usage on my account. The site responsible is www.turbokolor.com and the shop on that website (oscommerce). I have been going back and forth with support with limited success on reducing the resource usage 

I have so far implemented the following changes:
- upgrading of all the scripts
- optimization of oscommerce to limit number of queries
- deny access to robots both in robots txt and htaccess file
- deny access in htaccess file to known malicious ip's
- serving cached content
- serving thumbnailned images
- optimizing mysql tables
- disabling old / unused sites

But with about 800-900 average visitors a day my CPU% usage is usally above 2%. The weird thing is it can be at 0.7% most of the day and with one or 2 customers it goes to more than 2%. I can lookup live visitors on google analytics and for example right now I have 0 active vistors on the site and my resource usage is still at 2.2% cpu. Would be great to hear from someone who has gone thru this before me....

Thanks,
Wiktor
Logged
MrPhil
Senior Moderator
Berserker Poster
*****
Offline Offline

Posts: 5882



« Reply #1 on: April 02, 2013, 02:39:55 PM »

Any chance that those few customers spiking your usage are actually robots? A bot searching every nook and cranny of your site could well drive your CPU usage way up. Not all robots obey robots.txt (Baidu is notorious for being ill behaved). Anyway, if you can get an IP address for a heavy user, run a whois on them and see if they belong to a search engine company. If they're someone normally considered "well behaved", perhaps you can try various robot directives to ask them to visit less often and for longer times between queries. For a shopping site, you also have scrapers and consolidators grabbing price listings off your site for presentation to comparison shoppers. Normally that would be a good thing, but in your case it may be pushing your usage over the limit.

A store, unless it's restricted (invitation only) normally wants to be listed on search engines and consolidators. If that ends up driving your usage too high, you will have to consider whether it's worth the (hopefully) increased business to allow them, and just bite the bullet and move to a more expensive VPS (and eventually even to a dedicated server).
Logged

Visit My Site

E-mail Me
  
-= From the ashes shall rise a sooty tern =-
wkomorniczak
Space Explorer
***
Offline Offline

Posts: 7


« Reply #2 on: April 02, 2013, 10:15:53 PM »

HI,

Thanks for your answer. I also point to robots as the likely problem but right now I have blocked so many of them in .htaccess using IP's (based on project honeypot) and UA strings that I am suprised any still get thru.

The specifics of the shop are such that its not that important to me to be listed in every search engine, etc.  Not by any means an expert on .htaccess so if I paste it here would you be so kind as to take a look at it?

Thank you,

Wiktor
Logged
MrPhil
Senior Moderator
Berserker Poster
*****
Offline Offline

Posts: 5882



« Reply #3 on: April 03, 2013, 05:16:48 AM »

***-out anything sensitive, such as your account name, and I'll try to take a look at it today.

We were working on this over on the osCommerce forum. It looks like you reduced the size of your images you were sending over. Is there any image processing still going on the server side?
Logged

Visit My Site

E-mail Me
  
-= From the ashes shall rise a sooty tern =-
wkomorniczak
Space Explorer
***
Offline Offline

Posts: 7


« Reply #4 on: April 03, 2013, 07:10:51 AM »

Hi,

**** Arg! just deleted a reply I was working on for half an hour Sad lets try it again Smile *****

Thanks againg for helping. I have a simillar thread running os oscommerce forums (was out on vacataion but started posting there again) and was thinking your nick looks familiar and at  least now I know Smile

About the images. I have only one server side image manipulation and thats only when a new product is uploaded. The script thumbnails it and then uses the thumb version. Now that I think about it my main page slideshow is populated using an xml file which is read everytime - but that can't really be resource intensive can it be?

Another issue I was exploring is the time factor of the resource usage. I am in Poland (Central European Time) and untill about 3-4 pm my time most of the US users are asleep, waking up or on the way to work/school. Assuming LP hosts mostly or majorly US sites that would explain why my resource usage is well below 1% until 3-4 pm and then it spikes up (now its 1.9%). Since I get no spike in traffic during that time there must be a spike in general server usage which makes for fewer resources to go around which means (I'm just guessing here) that with the same usage I get higher resource usage stats. Not sure if this makes sense.

Just in case I am completly wrong here is my htaccess file. Its a bit of a mess (some duplicate things) since like I said I'm not an expert in this area and have been adding to it like crazy over the last 2 months. Just FYI all the banned IP's have been checked against project honeypot.

Thanks again for looking!

W

RewriteEngine on
<Files .htaccess>
   order allow,deny
   deny from all
</Files>

<IfModule mod_expires.c>

# Enable expirations
ExpiresActive On

# Default directive
ExpiresDefault "access plus 1 month"

# My favicon
ExpiresByType image/x-icon "access plus 1 year”

# Images
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"

# CSS
ExpiresByType text/css "access 1 month”

# Javascript
ExpiresByType application/javascript "access plus 1 year"

</IfModule>

<Files 403.shtml>
order allow,deny
allow from all
</Files>

deny from 72.44.32.0/19
deny from 67.202.0.0/18
deny from 75.101.128.0/17
deny from 174.129.0.0/16
deny from 204.236.192.0/18
deny from 184.73.0.0/16
deny from 184.72.128.0/17
deny from 184.72.64.0/18
deny from 50.16.0.0/15
deny from 50.19.0.0/16
deny from 107.20.0.0/14
deny from 23.20.0.0/14
deny from 54.242.0.0/15
deny from 54.234.0.0/15
deny from 54.236.0.0/15
deny from 54.224.0.0/15
deny from 54.226.0.0/15
deny from 50.112.0.0/16
deny from 54.245.0.0/16
deny from 54.244.0.0/16
deny from 204.236.128.0/18
deny from 184.72.0.0/18
deny from 50.18.0.0/16
deny from 184.169.128.0/17
deny from 54.241.0.0/16
deny from 182.118.20.0/24
deny from 101.226.167.0/24
deny from 54.228.63.0/24
deny from 177.71.231.0/24

deny from 79.125.0.0/17
deny from 46.51.128.0/18
deny from 46.51.192.0/20
deny from 46.137.0.0/17
deny from 46.137.128.0/18
deny from 176.34.128.0/17
deny from 176.34.64.0/18
deny from 54.247.0.0/16
deny from 54.246.0.0/16
deny from 54.228.0.0/16
deny from 175.41.128.0/18
deny from 122.248.192.0/18
deny from 46.137.192.0/18
deny from 46.51.216.0/21
deny from 54.251.0.0/16
deny from 54.252.0.0/16
deny from 175.41.192.0/18
deny from 46.51.224.0/19
deny from 176.32.64.0/19
deny from 103.4.8.0/21
deny from 176.34.0.0/18
deny from 54.248.0.0/15
deny from 177.71.128.0/17
deny from 54.232.0.0/16
deny from 85.152.20.10
deny from 208.115.111.0/24
deny from 208.115.113.0/24
deny from 182.118.25.0/24
deny from 182.118.22.0/24
deny from 178.255.215.65
deny from 199.21.99.87
deny from 218.30.103.0/24

deny from 143.28.232.29
deny from 218.30.103.132
deny from 150.70.74.81
deny from 150.70.74.99
deny from 150.70.74.171
deny from 150.70.75.0/24
deny from 208.115.110.102
deny from 208.115.110.240
deny from 208.115.111.68
deny from 150.70.172.103
deny from 150.70.172.109
deny from 157.55.35.90
deny from 150.70.75.29
deny from 208.115.113.84
deny from 69.171.229.116
deny from 66.220.152.2
deny from 69.171.234.0
deny from 150.70.172.208
deny from 150.70.172.203
deny from 69.171.234.4
deny from 66.220.152.6
deny from 157.56.229.88
deny from 157.55.36.37
deny from 157.55.33.40
deny from 66.220.152.3
deny from 150.70.97.43
deny from 218.93.127.118
deny from 62.24.222.132
deny from 62.24.222.131
deny from 119.63.196.123
deny from 60.36.84.49
deny from 1.202.218.71
deny from 119.63.196.94
deny from 85.115.58.180
deny from 85.96.29.159
deny from 83.23.114.12
deny from 83.21.254.164
deny from 83.8.12.217
deny from 83.7.39.244
deny from 157.55.33.181
deny from 150.70.75.36
deny from 83.9.19.139
deny from 178.36.124.59
deny from 83.24.42.179
deny from 69.171.230.248
deny from 89.229.21.209
deny from 69.171.230.246
deny from 173.252.101.4
deny from 180.76.5.173
deny from 180.76.5.15
deny from 123.125.71.25
deny from 123.125.71.0/24
deny from 69.171.224.0/24
deny from 217.67.201.162

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://demo.turbokolor.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://demo.turbokolor.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://dev.maggnes.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://dev.maggnes.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://i.turbokolor.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://i.turbokolor.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://kofifi.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://kofifi.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://kosmodrom3000.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://kosmodrom3000.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://lindas.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://lindas.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://maggnes.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://maggnes.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://mik.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://mik.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://photos.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://photos.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://poligon.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://poligon.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://preethi.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://preethi.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://store.turbokolor.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://store.turbokolor.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://studiovisla.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://studiovisla.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://swan1.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://swan1.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://szmidtdesign.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://szmidtdesign.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://turbokolor.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://turbokolor.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://turbokolor.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://turbokolor.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://unic-it.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://unic-it.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.demo.turbokolor.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.demo.turbokolor.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.dev.maggnes.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.dev.maggnes.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.i.turbokolor.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.i.turbokolor.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.kofifi.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.kofifi.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.kosmodrom3000.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.kosmodrom3000.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.lindas.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.lindas.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.maggnes.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.maggnes.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.mik.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.mik.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.photos.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.photos.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.poligon.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.poligon.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.preethi.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.preethi.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.store.turbokolor.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.store.turbokolor.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.studiovisla.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.studiovisla.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.swan1.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.swan1.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.szmidtdesign.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.szmidtdesign.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.szmidtdesign.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.szmidtdesign.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.turbokolor.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.turbokolor.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.turbokolor.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.turbokolor.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.unic-it.myquizco.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://www.unic-it.myquizco.com$      [NC]
RewriteCond %{HTTP_REFERER} !^http://szmidtdesign.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://szmidtdesign.com$      [NC]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]

deny from 119.63.196.124
deny from 173.252.101.6
deny from 23.22.39.206
deny from 199.30.16.56
deny from 199.30.16.52
deny from 199.59.149.169
deny from 173.199.114.187

deny from 131.253.24.3
deny from 199.30.20.99
deny from 157.56.93.94
deny from 157.55.33.88
deny from 157.56.93.223
deny from 199.21.99.87
deny from 208.115.113.84
deny from 61.135.248.221
deny from 61.135.249.160
deny from 218.30.103.132
deny from 85.25.176.105
deny from 91.197.15.34
deny from 87.205.252.243

deny from 61.135.249.175
deny from 178.255.215.81
deny from 46.4.100.231
deny from 69.58.178.56
deny from 63.147.126.186
deny from 74.86.12.187
deny from 176.9.139.112
deny from 23.20.102.111

deny from 173.192.170.114



ErrorDocument 403 /403.html
 
RewriteEngine On
RewriteBase /
 
# IF THE UA STARTS WITH THESE
RewriteCond %{HTTP_USER_AGENT} ^(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(libwww-perl|widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(TweetmemeBot|widow) [NC,OR]

# ANYWHERE IN UA -- GREEDY REGEX
RewriteCond %{HTTP_USER_AGENT} ^(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures).*$ [NC]

# STARTS WITH WEB
RewriteCond %{HTTP_USER_AGENT} ^web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack) [NC,OR]

# ISSUE 403 / SERVE ERRORDOCUMENT
RewriteRule . - [F,L]

SetEnvIfNoCase User-Agent "^Yandex" search_bot
SetEnvIfNoCase User-Agent "^BingBot " search_bot
SetEnvIfNoCase User-Agent "^Yahoo" search_bot
SetEnvIfNoCase User-Agent "^igdeSpyder" search_bot
SetEnvIfNoCase User-Agent "^Robot" search_bot
SetEnvIfNoCase User-Agent "^msnbot" search_bot
SetEnvIfNoCase User-Agent "^Aport" search_bot
SetEnvIfNoCase User-Agent "^Mail" search_bot
SetEnvIfNoCase User-Agent "^bot" search_bot
SetEnvIfNoCase User-Agent "^spider" search_bot
SetEnvIfNoCase User-Agent "^php" search_bot
SetEnvIfNoCase User-Agent "^Parser" search_bot
SetEnvIfNoCase User-Agent "^Baidu" search_bot
SetEnvIfNoCase User-Agent "^msnbot" search_bot
SetEnvIfNoCase User-Agent "^bingbot" search_bot
SetEnvIfNoCase User-Agent "^YandexBot" search_bot
SetEnvIfNoCase User-Agent "^Ezooms" search_bot
SetEnvIfNoCase User-Agent "^YodaoBot" search_bot
SetEnvIfNoCase User-Agent "^Sogou" search_bot
SetEnvIfNoCase User-Agent "^Linguee" search_bot
SetEnvIfNoCase User-Agent "^GG PeekBot" search_bot
SetEnvIfNoCase User-Agent "^Genieo" search_bot
SetEnvIfNoCase User-Agent "^Exabot" search_bot
SetEnvIfNoCase User-Agent "^BacklinkCrawler" search_bot
SetEnvIfNoCase User-Agent "^MJ12bot" search_bot
SetEnvIfNoCase User-Agent "^PiplBot" search_bot
SetEnvIfNoCase User-Agent "^SISTRIX" search_bot
SetEnvIfNoCase User-Agent "^Snipebot" search_bot

<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=search_bot
</Limit>

# IF THE UA STARTS WITH THESE
SetEnvIfNoCase ^User-Agent$ .*(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(libwww-perl|aesop_com_spiderman) HTTP_SAFE_BADBOT
Deny from env=HTTP_SAFE_BADBOT
 
Logged
MrPhil
Senior Moderator
Berserker Poster
*****
Offline Offline

Posts: 5882



« Reply #5 on: April 03, 2013, 10:34:39 AM »

I've only taken a very quick look at it, but you've got a $hI+load of sites whitelisted to use your graphics. Added up all together, I would think they would have quite a heavy load there. Is it necessary for all these sites to access your graphics? Are they all under your account (in which case you'd be paying for it even if they had their own copies of the files) or are they under other accounts or even other hosts?

Regarding your calculations on load, as more people load up more sites on the server, for a given load you're imposing on the server I would think that the percentage would go down (fixed number * 100 / increasing total load). So, if few people are hitting your server before 3 or 4pm, I would expect you to have your highest percentages before then (assuming your load is constant). Actually, I think it's a percentage of server capacity, so a fixed load on your part would be a fixed percentage load. Otherwise, in the middle of the night you're the only site being used -- you'd be at 100%!

If you're being seriously hit by search bots and shopping comparison sites (scraping your site), that could be a problem.
Logged

Visit My Site

E-mail Me
  
-= From the ashes shall rise a sooty tern =-
wkomorniczak
Space Explorer
***
Offline Offline

Posts: 7


« Reply #6 on: April 03, 2013, 11:47:19 AM »

About the whitelisted site I can see the entries got somehow doubled. But all those sites are in my account with many different sites and only one htaccess i  whitelisted them all so they can use the images. What do you mean that I would be paying for it even if they are all under my account?

I see what you mean about calculations but not sure how it makes sense. If server is completly free and I'm using 2% of the load that account to 2% of the available resources. If server is 50% busy and I am still using 2% that would account to 4% of free resources? Not sure if it can be thought of it in this way though.

Now about robots. As you can see in my htaccess I have blackisted A lot of robots and IP's. Checked recent raw access logs and had no bots in UA other than googlebot (ok). I do still get IP's that projecthoneypot lists as bad which I keep adding to the deny list.

Having said that I have 7 active users on the site now (google abalytics) and using 3.5% of the resources. Thats ridicilous! Sad

Thanks again!
W
Logged
MrPhil
Senior Moderator
Berserker Poster
*****
Offline Offline

Posts: 5882



« Reply #7 on: April 03, 2013, 12:22:18 PM »

I think you can do the hotlink protection section like this, provided that turbokolor.com, maggnes.com, myquizco.com, and szmidtdesign.com all belong to you (whether they're under one account or not):
Code:
# hotlink protection against other sites grabbing my images
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(.+\.)?turbokolor\.com(/?).*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://(.+\.)?maggnes\.com(/?).*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://(.+\.)?myquizco\.com(/?).*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://(.+\.)?szmidtdesign\.com(/?).*$      [NC]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]
It probably won't have much impact on CPU usage, but at least it would be a lot neater. "(.+\.)?" is supposed to match any subdomain/www or none at all.

If these other domains are putting a serious load on your images, they'll all count together for your server load. If they were 3 or 4 separate accounts, each with their own copy of these image files, the maximum load per account would be much lower. However, you'd be paying 3 or 4 times as much for hosting. Pick your poison. It still might be cheaper than VPS if you can't get your total usage down.

LP doesn't explain very well what they're counting towards total CPU load. I'm just guessing what would make sense in a rational world (that and a dollar will get you a small cup of so-so coffee). I would think it would count as 2% either way in your example (not X% of "free resources").
Logged

Visit My Site

E-mail Me
  
-= From the ashes shall rise a sooty tern =-
wkomorniczak
Space Explorer
***
Offline Offline

Posts: 7


« Reply #8 on: April 03, 2013, 12:55:36 PM »

Thanks. I'll be sure to make those changes in htaccess. Regarding the timing its just a side note anyway.

About the other sites. Only the turbokolor ones are responsible for the usage. The others are either dead or have very few hits.

Back to robots. I did find some new entries from jikespider and blocked it. All the other ones are still showing up in logs but only one hit resulting in a 303 error. So i m guessing thats fine.

I talked to some other hosts about changing and they all tell me i wont have such an issue with their service with my setup but how can they guarantee it, right?


About vps. I dont mind paying but need to be sure thats the right way. So far no one i talked to suggested i need a bigger box based on my statistics...
Logged
wkomorniczak
Space Explorer
***
Offline Offline

Posts: 7


« Reply #9 on: April 04, 2013, 04:41:23 AM »

I have now installed the Page Cache v1.5 contributiong (DUH!) and waiting to see if this brings the resources down. Theoretically it should generate all the PHP pages and serve the static stuff to the users therefore lowering resources significantly. I was till now using only the built in oscommerce cache but realised it is only caching the categories bos (Double DUH!)

Also installed the PageSecurity contribution to protect against sql injections (if any).

Let's see how we do....
Logged
wkomorniczak
Space Explorer
***
Offline Offline

Posts: 7


« Reply #10 on: April 07, 2013, 05:30:53 AM »

so no change or maybe even a change to the worse :/

My stats as of right now:

Stats for 07 Apr 2013:
---------------------------------
CPU Usage - %3.20
MEM Usage - %0.79
Number of MySQL procs (average) - 0.10
Top Process   %CPU 10.70   httpd [turbokolor.myquizco.com] [/shop/images/22tkimg/1tk_store_FW2011_part1_0004s_0001s_000]
Top Process   %CPU 9.60   httpd [turbokolor.myquizco.com] [/shop/images/22tkimg/1tk_store_FW2011_part1_0004s_0000s_000]
Top Process   %CPU 9.30   httpd [turbokolor.myquizco.com] [/shop/images/37tkimg/StavrossEMBlightgreymini217x266.jpg]

And analytics is showing 8 active users.

Still completly don;t understand how 3 image files are listed as my top processes. The files are all aprox 5kb, and all my php content from the shop is being served cached :/

I'm totally lost...

Logged
MrPhil
Senior Moderator
Berserker Poster
*****
Offline Offline

Posts: 5882



« Reply #11 on: April 07, 2013, 07:47:03 AM »

Those images don't appear to be unreasonably large. In the process of serving up these images, is any processing being done on them? You don't have anything in .htaccess triggering processing on them, do you? If those are simply image files, I can't imagine what would be drawing so much processing power. How many visitors (especially robots) are on your site at any one time?

If you haven't done so already, I would ask tech support if they can do anything to find why simply serving up these files is consuming so many resources (that assumes that you aren't actually rescaling them at all, and are simply throwing them over the wall to the browser). If you have more than a handful of bots on your site, you might try blocking them temporarily, a few at a time, to see who is responsible for this. Maybe some bot is requesting the same file over and over and over (possibly as a DoS attack by a competitor)? You'll have to work with support on this to make sure your get accurate usage figures when trying different things.
Logged

Visit My Site

E-mail Me
  
-= From the ashes shall rise a sooty tern =-
Pages: [1]   Go Up
  Print  
 
Jump to: