How to decide the robots.txt for WordPress blogs
Robots.txt is used to define the Robots Exclusion Protocols for the websites. It handles the behaviors of all the robots, bots and web-crawler programs. In a simple words any web-crawler program or bots visiting any website checks for the root file /robots.txt, which defined Exclusion Protocols for that bots. One of the common example of the robots.txt file is defined below
Disallow: /
Where User-agent: defines the type of bots and Disallow defined the exclusion for particulars or type of url locations.
User-agent: * means all types of bots
Disallow: / means exclusion for all the files and pages located at websites.
How to decide the robots.txt for WordPress blogs
A common example of robots.txt file used at honeytechblog is listed as below
User-Agent: *
Disallow: */?mobi*
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /wp-
Disallow: /*.css$
Disallow: */forums/bb-login.php?*
Allow: twitter.honeytechblog.com
User-agent: Googlebot-Image
disallow:
User-agent: Mediapartners-Google*
disallow:
Explanations of the common exclusions and agent used in the robots.txt files
1.Sitemap: http://www.honeytechblog.com/sitemap.xml
Used to define the sitemap location for the bots, these will create ease for search bots to detect your new pages.
2.User-Agent: *
Already described above
3.Disallow: */?mobi*
Used to exclude the pages containing “/?mobi”. I used this feature to avoid the content duplicacy issues generated for mobile users.( It is not necessary for you )
4.Disallow: /wp-admin/
Used to exclude the wordpress admin pages from the search engine. It is necessary to avoid the listing of any hack prone page or errors.
5.Disallow: /wp-includes/
Used to exclude the WordPress includes folder which also necessary to avoid from the searching bots. It is necessary because some times when your WordPress faces any plugins or update issues, it floats a serious errors which can be easily indexed by the search bots or hackers.
6.Disallow: /wp-content/
Again it is not necessary to index all the files in the wp-contents.
7.Disallow: /wp-
For security purpose its better hide all the core files and pages.
8.Disallow: /*.css$
For exclusion of all the style-sheets. (If you want to further protect your css files)
Note: Disallow: /*.”fileextension”$ can be used to exclude the “file extension” from the reach of bots. Where “file extension” can be any extensions you want like *.txt$,*.php$ ,*.jsp$, or *.jpg$
9.Disallow: /*?
Used to disallow all the urls having “?” in it. (Used to avoid content duplicacy issues, tracking urls and custom features from the reach of bots)
10.Disallow: /name/
Used to disallow any directory ,folders or categories. for example you want to disallow “admin” folder then you can simple use “Disallow: /admin/” , if you want to disallow a category named “download” then you can simply use “Disallow:/category/download*” and for uncategorized category use can use “Disallow: /category/uncategorized*”
Extra:
To allow all the images bots (like google image bot) to search and index all images of the website / blog
disallow:
Allow: /*.png$
Allow: /*.jpg$
Allow: /*.gif$
Allow: /*.jpeg$
Allow: /*.jpg$
Allow: /*.ico$
Allow: /images/
To allow all the adsense bot to crawler with ease on entire site
User-agent: Mediapartners-Google*
disallow:
Related Posts
Allow: /*.png$
Allow: /*.jpg$
Allow: /*.gif$
Allow: /*.jpeg$
Allow: /*.jpg$
Allow: /*.ico$
Allow: /images/
To allow all the adsense bot to crawler with ease on entire site
User-agent: Mediapartners-Google*
disallow:
Related Posts
disallow:
September 14th, 2009 at 1:32 am
How to decide the robots.txt for #Wordpress blogs http://bit.ly/4mH4R
September 22nd, 2009 at 6:41 pm
I think you made a mistake in this post. Instead of specifying “To disallow all the adsense bot to crawler with ease on entire site”, you put “To allow all the adsense bot to crawler with ease on entire site”
.-= BlogrPro´s last blog ..BlogrPro is now more Faster and Fresher =-.
September 22nd, 2009 at 7:04 pm
@BlogrPro
Why do i want to ban the adsense on the blog?
No one want to ban the adsense bots unless and until they are not using it !
In a simple words if you disallow the Mediapartners-Google, you cannot display adsense ads on your website !
http://www.google.com/support/forum/p/AdSense/thread?tid=60b8df97e2074da7&hl=en
October 9th, 2009 at 1:50 am
Configure the robots.txt for #Wordpress blogs http://bit.ly/4mH4R
November 14th, 2009 at 1:36 am
Thanks for showing me this very useful info I really like it.
November 23rd, 2009 at 3:43 pm
Thanks for the extraordinary article Honeysingh. Looking for such a article.
.-= Pavan Somu´s last blog ..Best Monetizing Plugins For Your Blogs =-.
December 6th, 2009 at 1:52 pm
Thanks for the tutorial, how could we know that in our blog which is to be indexed and which is not to be indexed.
.-= Vivek´s last blog ..MBA Colleges of Assam =-.
December 6th, 2009 at 2:01 pm
@Vivek
Disallow: */page url here > Ban the search engine indexing on particular “page url”
January 11th, 2010 at 10:20 am
How to decide the robots.txt for WordPress blogs http://bit.ly/5splre #wordpress
January 11th, 2010 at 10:30 am
RT @djbhai: How to decide the robots.txt for WordPress blogs http://bit.ly/5splre #wordpress
May 21st, 2010 at 2:44 pm
Hi, thank you for this useful information.
Best Regards,
Jessica
June 30th, 2010 at 7:36 pm
Monetizing websites, blogs, etc is a good way to earn some passive income.~;.