Web Hosting Forum | Lunarpages
News: October 6, 2008 - Submit Your Site for the October 2008 Site of the Month!
 
*
Welcome, Guest. Please login or register.
Did you miss your activation email?
October 12, 2008, 11:42:52 PM


Login with username, password and session length


Pages: [1] 2 3 ... 5   Go Down
  Print  
Author Topic: Using Robots.txt To Protect Files/Folders  (Read 15272 times)
JamesG
Lager Ship
Berserker Poster
*****
Offline Offline

Posts: 12026


If In Doubt, Cluster!


WWW
« on: November 12, 2004, 06:46:04 AM »

A lot of people like to protect certain files and folders from being indexed by search engines and other robots...

WHAT IS A ROBOTS.TXT FILE?

A robots.txt file is an instruction to the robots and search engines that travel the web and add your site to their directories, for people to search for them.

Before you begin, you need to know how to write the .txt file.

Prepare it in a text editor such as Notepad. Don't attempt it in Word or an HTML editor such as FrontPage. When you're finished, save it as "robots.txt".

If you want to disallow all robots from all your files and folders you would use the following:

User-agent: *
Disallow: /


If you only want to dissalow certain search engines like Google or webcrawler you must add the following lines to your robot.txt file

User-agent: Google
User-agent: WebCrawler


to disallow all search engines from access the files folders your going to specify you must use the command:

User-agent: *

To disallow them from searching your folders the command is:

Disallow: /foldername/
or
Disallow: /folder/foldername/ (if your folder is inside another)

For this to work your robots.txt file must be uploaded using ASCII format, depending on what program you use to upload your files depends on how you do this.

Hope This Helps Very Happy

Admins/Moderators, Feel Free To Modify This Post Thumbs Up
Logged

donavin410
Galactic Royalty
*****
Offline Offline

Posts: 331



WWW
« Reply #1 on: November 29, 2004, 11:14:19 AM »

Great I will try this once I have access to my site again.. it has been down for a few days... hence my avatar not working.. Crying or Very sad
Logged

d-410.com
JamesG
Lager Ship
Berserker Poster
*****
Offline Offline

Posts: 12026


If In Doubt, Cluster!


WWW
« Reply #2 on: November 29, 2004, 11:37:20 AM »

donavin410, didnt realise you were a moderator Very Happy
Logged

TranzNDance
Princess of Naboo
Berserker Poster
*****
Offline Offline

Posts: 11809



WWW
« Reply #3 on: November 29, 2004, 11:43:45 AM »

You can also specify file types to protect:

This is for regular google
Code:
User-agent: Googlebot
Disallow: /*.jpg$


This is to prevent your images from being indexed and showing up in Google Images
Code:
User-Agent: Googlebot-Image
Disallow: /
Logged

Grr..!! Luff Ya Grr..!! Luff Ya Grr..!! Luff Ya
JamesG
Lager Ship
Berserker Poster
*****
Offline Offline

Posts: 12026


If In Doubt, Cluster!


WWW
« Reply #4 on: November 29, 2004, 12:01:17 PM »

i don't even though why i learned this, as the only way my site would ever hit it's bandwidth is if everybody on this forum linked every image, and went there once a day
Logged

TranzNDance
Princess of Naboo
Berserker Poster
*****
Offline Offline

Posts: 11809



WWW
« Reply #5 on: November 29, 2004, 12:06:30 PM »

It's not exactly a bandwidth issue as much as not everyone wants people using their images from a Google Image search.
Logged

Grr..!! Luff Ya Grr..!! Luff Ya Grr..!! Luff Ya
JamesG
Lager Ship
Berserker Poster
*****
Offline Offline

Posts: 12026


If In Doubt, Cluster!


WWW
« Reply #6 on: November 29, 2004, 12:20:46 PM »

suppose..
Logged

fretnmore
Grandma Looney
Über Jedi
*****
Offline Offline

Posts: 2860



WWW
« Reply #7 on: November 29, 2004, 12:24:50 PM »

What I would like to know is how to remove some images from Google's image search. They still have some links to images that haven't been on my site for over six months. I would have thought that a new index would clean that up, but apparently not.
Logged

Life is not measured by the number of breaths we take, but by the moments that take our breath away.
----------------------------------------------------------
Tri-Wolf Studios
Lunarpages Web Hosting
Lunarpages Forums
Lunarpages Affiliate Program
Ed
Berserker Poster
*****
Offline Offline

Posts: 5205



WWW
« Reply #8 on: November 29, 2004, 12:29:58 PM »

I don't think image search cleans up very often. You can email them directly to have it removed. You would have to block it in the robots file for it to happen fast.

- Ed
Logged

fretnmore
Grandma Looney
Über Jedi
*****
Offline Offline

Posts: 2860



WWW
« Reply #9 on: November 29, 2004, 12:33:58 PM »

Thanks Ed -  I will give both those things a shot and see what happens. I didn't think I would care about having images indexed, but if it takes this long to get cleaned up I would rather that they didn't index it to begin with.
Logged

Life is not measured by the number of breaths we take, but by the moments that take our breath away.
----------------------------------------------------------
Tri-Wolf Studios
Lunarpages Web Hosting
Lunarpages Forums
Lunarpages Affiliate Program
Nibbler
21st century digital boy
Master Jedi
*****
Offline Offline

Posts: 1178



WWW
« Reply #10 on: December 07, 2004, 07:06:59 AM »

You can test if my spider would have crawled a link to your site here.
Logged

Missing since 1983

leighsww
* The Tough Love Cuddly One *
Berserker Poster
*****
Offline Offline

Posts: 13870


WWW
« Reply #11 on: December 20, 2004, 06:49:25 PM »

Quote from: Nibbler
You can test if my spider would have crawled a link to your site here.

Do you know if there's a typo on their site, cuz when I tested it without a "robots.txt" in my public_html, it says this:

Quote
Result: No robots.txt found, so page will be crawled


Then, I put the "robots.txt" file in my public_html and it says this:

Quote
Result: Page will be crawled - passed robots.txt check


LOL, if my eyes aren't deceiving me, doesn't that say basically the same thing ... that the page will be crawed?!  Surprised

Hmmmm, I'm hoping they just made a typo and forgot the "NOT" in the second results.
Logged
Lupine1647
Berserker Poster
*****
Offline Offline

Posts: 10868


« Reply #12 on: December 20, 2004, 07:29:54 PM »

Leigh, the page is alright, if there is no robopts.txt file, the crawler will crawl, if your robopts.txt file isn't disallowing that bot, then it will crawl.
Logged
leighsww
* The Tough Love Cuddly One *
Berserker Poster
*****
Offline Offline

Posts: 13870


WWW
« Reply #13 on: December 20, 2004, 07:38:00 PM »

Quote from: Lupus1647
Leigh, the page is alright, if there is no robopts.txt file, the crawler will crawl, if your robopts.txt file isn't disallowing that bot, then it will crawl.

But I put the following code in the "robots.txt" (as Garvey instructed - and I followed every instruction he noted about doing it in Notepad, uploading in ASCII):

Code:
User-agent: *
Disallow: /


And it gave the following results:

Quote
Result: Page will be crawled - passed robots.txt check


Try it Lupus and let me know what yours says when using the link that nibbler gave.
Logged
Lupine1647
Berserker Poster
*****
Offline Offline

Posts: 10868


« Reply #14 on: December 20, 2004, 07:40:26 PM »

Why would I risk loosing my search engine placement?

I'll try it on a sub-domain.
Logged
Pages: [1] 2 3 ... 5   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.6 | SMF © 2006-2008, Simple Machines LLC

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM