Web Hosting Forum | Lunarpages
News: October 6, 2008 - Submit Your Site for the October 2008 Site of the Month!
 
*
Welcome, Guest. Please login or register.
Did you miss your activation email?
October 15, 2008, 04:16:46 PM


Login with username, password and session length


Pages: 1 2 3 [4] 5   Go Down
  Print  
Author Topic: Using Robots.txt To Protect Files/Folders  (Read 15305 times)
geeves
Space Explorer
***
Offline Offline

Posts: 8


WWW
« Reply #45 on: August 18, 2005, 07:27:48 PM »

It's not exactly a bandwidth issue as much as not everyone wants people using their images from a Google Image search.

Actually, googlebot just killed my bandwidth for the month.  Repeatingly downloaded some very large files that I thought were already protected, but my robots.txt file was missing (probably accidentally deleted it Sad ).  My traffic spiked the last 5 days and went through about 30gigs of it.
Logged
JamesG
Lager Ship
Berserker Poster
*****
Offline Offline

Posts: 12026


If In Doubt, Cluster!


WWW
« Reply #46 on: August 19, 2005, 07:25:36 AM »

if you need a few more GB of bandwidth you can e-mail sales@lunarpages.com and request some, they are charged at $3.95 per GB per month that you use and will only be charged for the month you authorise them to.

and then, go back to page one and re-create your robots.txt file Razz

good luck
Logged

stapel
Galactic Royalty
*****
Offline Offline

Posts: 496


« Reply #47 on: September 10, 2005, 10:28:58 AM »

Quote from: TranzNDance
robots.txt is mostly a way to keep bots out. If you want to welcome them all, you do not need one.
You might want one, even if you leave it blank, so as to avoid 404 errors. When the good bots (that is, the ones that respect robots.txt instructions) come by, they will look for the robots.txt file. If it isn't there, you'll get a 404 error in your logs, due to the bot having requested a file that doesn't exist. I have a client whose site was getting an astonishing number of 404 errors, and they were all due to not having a robots.txt file.

Quote from: TranzNDance
I don't think having a robots.txt file is listed in the list of things to do to improve ranking.
I believe this is correct. Since the file only blocks (good) bots from files, it would do nothing to encourage those bots to access your site in the first place. But if you block all bots, then your site wouldn't get crawled; this would prevent your getting ranked (since, as far as the bot is concerned, you don't exist).

Quote from: FST2005
would it be best to just accept every spider to get a good ranking?
I don't know that you'd want every spider; that will depend on what you've got to protect from spidering. I think the robots.txt file is more often used to "hide" things like private directories (a family page, say, that you don't care to see in the search engines). That is, you would use the robots.txt file to "hide" just a part of your site. You can still leave the part you want in the search engines open to all the bots.

Eliz.
Logged
TranzNDance
Princess of Naboo
Berserker Poster
*****
Offline Offline

Posts: 11809



WWW
« Reply #48 on: September 10, 2005, 10:43:44 AM »

The problem with hiding directories with robots.txt is that a human visitor looking at the file will know what the site owner wants to hide and can navigate to those folders.
Logged

Grr..!! Luff Ya Grr..!! Luff Ya Grr..!! Luff Ya
TranzNDance
Princess of Naboo
Berserker Poster
*****
Offline Offline

Posts: 11809



WWW
« Reply #49 on: September 10, 2005, 10:50:17 AM »

hmmm... very interesting the stuff whitehouse.gov wants to hide from bots...
Logged

Grr..!! Luff Ya Grr..!! Luff Ya Grr..!! Luff Ya
stapel
Galactic Royalty
*****
Offline Offline

Posts: 496


« Reply #50 on: September 10, 2005, 10:52:20 AM »

Quote from: TranzNDance
The problem with hiding directories with robots.txt is that a human visitor looking at the file will know what the site owner wants to hide and can navigate to those folders.
True, but robots.txt isn't intended to provide security against bad guys (or bad bots). It's to manage crawling by the good guys. If you have something that you want protected against bad bots or malicious users, you would need to use other methods.

Eliz.
Logged
TranzNDance
Princess of Naboo
Berserker Poster
*****
Offline Offline

Posts: 11809



WWW
« Reply #51 on: September 10, 2005, 10:55:58 AM »

I know the intention of robots.txt but I'm just giving people a heads-up that if they don't use other means of protecting their directories, robots.txt is the equivalent of announcing to people where their "hidden" stash is.
Logged

Grr..!! Luff Ya Grr..!! Luff Ya Grr..!! Luff Ya
stapel
Galactic Royalty
*****
Offline Offline

Posts: 496


« Reply #52 on: September 10, 2005, 11:04:45 AM »

You make a good point.

Eliz.
Logged
Parity
Space Explorer
***
Offline Offline

Posts: 7


« Reply #53 on: April 19, 2006, 02:49:45 PM »

Hi, I did what you said to do in Notepad, but when I open up WS_FTP LE, I can't find Notepad so that I can transfer the file over to my public.html file. Please tell me what to do.

Logged
SOU610
Jabba the Hutt
*****
Offline Offline

Posts: 694



WWW
« Reply #54 on: April 19, 2006, 06:05:11 PM »

Does it stop robots from digging into child directories? 

Example, say I have /dirA/dirB/
I then do:
User-agent: *
Disallow: /dirA/

Will dirB not be crawled since it's under dirA?  That was my thinking until I looked at some robots.txt files of other sites and have seen what seems to be full directory trees...
Logged

Nibbler
21st century digital boy
Master Jedi
*****
Offline Offline

Posts: 1178



WWW
« Reply #55 on: April 20, 2006, 07:43:38 AM »

Disallow: /dirA/ will affect all paths that begin with /dirA/, so that includes child directories.

What you have been looking at are probably ones like this:

User-agent: *
Disallow: /folder/subfolder1
Disallow: /folder/subfolder2
Disallow: /folder/subfolder3

That would indicate that those 3 subfolders should not be crawled, but /folder/subfolder4 or /folder/ can be crawled.
Logged

Missing since 1983

malamute
Guest
« Reply #56 on: July 04, 2006, 11:24:12 PM »

Google was taken to court and they lost,was about stealing images and other stuff from someones web page.I also keep on eye on my last visitors log and any new search bots get banned
« Last Edit: July 04, 2006, 11:30:55 PM by malamute » Logged
RAT
Wizard of Telecastria
Über Jedi
*****
Offline Offline

Posts: 2874


HAIRNT !


WWW
« Reply #57 on: August 02, 2006, 01:14:26 PM »

I was just doing some research on just what a "robots.txt" file is (along with what else is supposed to be inside public_html and what's not) and came across this thread. Now I know I can leave this one in my public_html when I delete my current site's folders and add my newly rebuilt site !

Since I am currently using Mambo, my robots.txt file contains the following:

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /editor/
Disallow: /help/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /mambots/
Disallow: /media/
Disallow: /modules/
Disallow: /templates/
Disallow: /installation/

Thanks for this link James !!

RAT

Logged
somaiah
Space Explorer
***
Offline Offline

Posts: 8



WWW
« Reply #58 on: March 01, 2007, 11:45:14 AM »


Nice - I was looking for a way to have a "secure" blog. ie have a blog that people cannot find through search engines, but that I can share with some folks on the net.

Looks like this is a good way to do it.

Thanks!

S.
Logged
SteveW
Master Jedi
*****
Offline Offline

Posts: 1394


WWW
« Reply #59 on: March 01, 2007, 05:52:39 PM »

Using robots.txt will keep well behaved robots from indexing your page, but there are robots that ignore robots.txt, so your content could still wind up being generally available.

The next step in security would be put your content in a folder of your site that has no link to it from the home page (or anywhere else). Give the name of the folder to anyone you want to have access.

That is still not actually protected, however. If your blog will contain truly personal info that you don't want republished anywhere, the only solution is to password protect the folder it's in and give the password to selected people. Even that solution is only as trustworthy as the people to whom you give the password.
Logged





Mt. Shasta
photo gallery.


Don't forget Lunarpages 24/7/365 support documentation:
Flash Tutorials, Knowledge Base FAQ Articles, cPanel Manual, Glossary/Dictionary, Support Tickets,
and
Forum Search.

Pages: 1 2 3 [4] 5   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.6 | SMF © 2006-2008, Simple Machines LLC

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM