Web Hosting Forum | Lunarpages
News: October 6, 2008 - Submit Your Site for the October 2008 Site of the Month!
 
*
Welcome, Guest. Please login or register.
Did you miss your activation email?
October 12, 2008, 06:41:38 PM


Login with username, password and session length


Pages: [1]   Go Down
  Print  
Author Topic: Blocking spider with htaccess not working  (Read 882 times)
scanman20
Master Jedi
*****
Offline Offline

Posts: 1251



WWW
« on: March 13, 2008, 09:33:45 AM »

I created an htaccess file in the root of an addon domain and added

Order Allow,Deny
Deny from 82.99.30.
Allow from all

The 82.99.30.* range is some search spider that constantly visits one of my sites and I want to deny them access to the entire site. Everything I've read says that this should work yet this address still shows up in my logs. What am I doing wrong? Confused
Logged

Even a broken clock is right twice a day.
NotOneBit.com
MCSE - MCSA - MCP
MrPhil
Quantum Encyclopedia Writer
*****
Offline Offline

Posts: 3400



« Reply #1 on: March 13, 2008, 05:38:02 PM »

Is that the complete code you put in? If so, are you missing an enclosing <Files *> and </Files>? Something like that -- I don't claim expertise on this aspect of .htaccess files, but maybe that will give you a clue where to look?

Oh yeah: I presume you tried robots.txt first, and this spider is ill-mannered and refuses to obey the directives? I think you can specify an IP address there, but I won't swear to it.
« Last Edit: March 13, 2008, 05:40:47 PM by MrPhil » Logged

scanman20
Master Jedi
*****
Offline Offline

Posts: 1251



WWW
« Reply #2 on: March 14, 2008, 07:28:32 AM »

It's strange, at some level it seems to be working. For example it seems to be blocking requests for directories but not for specific files. Requests for example.com/foo/ are blocked but example.com/foo/page.php aren't/.

Yeah I know that this particular engine ignores robots.txt files so I'm not even bothering.

What's unusual is that these IP addresses show up in my last 300 visitor report, but don't seem to hit analog or webalizer.
Logged

Even a broken clock is right twice a day.
NotOneBit.com
MCSE - MCSA - MCP
scanman20
Master Jedi
*****
Offline Offline

Posts: 1251



WWW
« Reply #3 on: March 24, 2008, 09:41:29 AM »

Strange that this is still occurring even though everything I know about htaccess (which is limited) tells me it should be working. An htaccess file in the top level directory should apply to that directory and any subdirectories that don't have their own htaccess file. Somehow, this IP is managing to evade the block I have. To answer Mr. Phil, I'm not using <Files *> since I'm interested in blocking this IP range from all directories and files and not just a select group.

What's also puzzling me is where these IP addresses are showing up. I can always find them in my last 300 visitors stats, but don't see them show up in webalyzer or analog. I'm wondering if the requests are being logged by the last 300 visitors script but not actually delivering content to this IP range. Odd.
Logged

Even a broken clock is right twice a day.
NotOneBit.com
MCSE - MCSA - MCP
Pages: [1]   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.6 | SMF © 2006-2008, Simple Machines LLC

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM