Web Hosting Forum | Lunarpages


*
Welcome, Guest. Please login or register.
Did you miss your activation email?



Login with username, password and session length
February 07, 2012, 09:57:10 PM

Pages: [1]   Go Down
  Print  
Author Topic: Creating the Best robots.txt for SMF  (Read 5552 times)
Mitch
Berserker Poster
*****
Offline Offline

Posts: 12838


WWW
« on: August 17, 2009, 08:15:33 AM »

As many of you know, the robots.txt file is the file that search engines look at for "instructions" on which files and folders they should or should not index.  Looking around for a good SMF-inspired robots.txt file, I found this:

Code:
User-agent: *
Disallow: index.php?action=search*
Disallow: index.php?action=calendar*
Disallow: index.php?action=login*
Disallow: index.php?action=register*
Disallow: index.php?action=profile*
Disallow: index.php?action=stats*
Disallow: index.php?action=arcade*
Disallow: index.php?action=printpage*
Disallow: index.php?PHPSESSID=*
Disallow: index.php?*rss*
Disallow: index.php?*wap*
Disallow: index.php?*wap2*
Disallow: index.php?*imode*


This will tell the search engines to stop following those directories and links, and not to index the content within.  If you are using a plugin or service that re-writes your addresses to make your links more "SEO friendly" the above code might need to be edited some, but you get the basic idea of what needs to be done. 

Just wanted to share my find, just in case it helps somebody else out.  Anybody have any suggestions on how to do it better?
Logged

New to Web Site Hosting? Check Out the Lunarpages Blog Hosting Guide!


Follow us @lunarpages on Twitter!
Important Threads: Read This Before Posting! | Lunarforums Rules! | Mitch's Link of the Day!
Also, be sure to check out and subscribe to the Lunartics Blog and the Lunarpages Newsletter !

Need Web Hosting Help? Check out the Lunarpages Web Hosting Wiki. It has tons of tips, tutorials and resources!
818
Guest
« Reply #1 on: August 17, 2009, 11:38:18 PM »

Awesome! I was using a smaller one and google was able to take me to page 3 for the key words, "dog forum"  I will give this one a try to see how much better my site gets indexed.

Thanks Mitch!
Logged
Mitch
Berserker Poster
*****
Offline Offline

Posts: 12838


WWW
« Reply #2 on: August 18, 2009, 05:13:32 AM »

Not a problem, happy to help!  Applause
Logged

New to Web Site Hosting? Check Out the Lunarpages Blog Hosting Guide!


Follow us @lunarpages on Twitter!
Important Threads: Read This Before Posting! | Lunarforums Rules! | Mitch's Link of the Day!
Also, be sure to check out and subscribe to the Lunartics Blog and the Lunarpages Newsletter !

Need Web Hosting Help? Check out the Lunarpages Web Hosting Wiki. It has tons of tips, tutorials and resources!
LakeXeno
Intergalactic Superstar
*****
Offline Offline

Posts: 195



WWW
« Reply #3 on: September 10, 2010, 09:29:06 AM »

Not a problem, happy to help!  Applause

Only thing I've found is when you go to sites that talk about "what your website looks like to a spider" they always claim the wildcards are not "standard" and I get a lot of errors/bad marks in this regards on those sites.

Not sure the validity on it but some sites claim there is a robots.txt "standard" somewhere that I need to find.

I'll update when I can.

UPDATE: it was in fact an old post I was reading posted quite a while ago. *Removes from favs* There was a sort of standard that essentially google enforced and wildcards are applicable in any robots.txt.

So I'll have to send word to that spider site if I can find it again...
« Last Edit: September 10, 2010, 12:00:50 PM by LakeXeno » Logged


http://www.lakexeno.com

We are a community of many faiths and lifestyles. We provide a place to sit, mingle & chat amongst common friends.
Abhinandangarg
Spaceship Captain
*****
Offline Offline

Posts: 114


WWW
« Reply #4 on: September 21, 2010, 05:11:05 AM »

Great example of robots.txt file.

please guide me what is the use of User-agent in robots.txt file



Thanks
« Last Edit: September 21, 2010, 05:12:41 AM by Abhinandangarg » Logged

LakeXeno
Intergalactic Superstar
*****
Offline Offline

Posts: 195



WWW
« Reply #5 on: October 25, 2010, 12:24:30 PM »

Great example of robots.txt file.

please guide me what is the use of User-agent in robots.txt file



Thanks

user-agent is a sort of official naming scheme used for the bot. I'll see if I can get you a copy of the known bots so far.

UPDATE: Here we go, a list of most known bots that can each be added specifically to your robots.txt file: http://www.robotstxt.org/db.html
Logged


http://www.lakexeno.com

We are a community of many faiths and lifestyles. We provide a place to sit, mingle & chat amongst common friends.
michaelvk
Trekkie
**
Offline Offline

Posts: 12


WWW
« Reply #6 on: January 12, 2011, 01:11:51 AM »

Awesome! I was using a smaller one and google was able to take me to page 3 for the key words, "dog forum"  I will give this one a try to see how much better my site gets indexed.

Thanks Mitch!


Thanks for posting this,

How can we do something similar for all other bots?
Cheers!
Logged

davidpo
Spacescooter Operator
*****
Offline Offline

Posts: 45

Intellectsoft SEO Department.


WWW
« Reply #7 on: November 11, 2011, 04:27:42 AM »

Thank you Mitch, this is really well done robots.txt
For example, this is the right robots.txt for Wordpress:

Code:
User-agent: *
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /webstat/
Disallow: /feed/
Disallow: /trackback
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Disallow: /category/*
Disallow: /wp-content/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /comments
Disallow: /page/*

* Only for SEO frandly URLs and well optimized website. )
Logged

Abhinandangarg
Spaceship Captain
*****
Offline Offline

Posts: 114


WWW
« Reply #8 on: November 14, 2011, 02:30:49 AM »

Thanks for posting this !
 Nope
Logged

rozerdun
Newbie
*
Offline Offline

Posts: 4


« Reply #9 on: November 17, 2011, 03:56:40 AM »

Very informative discussion thanks for that but for me robots.txt is very confusing.


_______________________________________________
Web Development | Web Design and Development
Logged
Pages: [1]   Go Up
  Print  
 
Jump to: