Home | About | Help Center | Privacy Policy

September 14th, 2009

How to decide the robots.txt for Wordpress blogs


Robots.txt is used to define the Robots Exclusion Protocols for the websites. It handles the behaviors of all the robots, bots and web-crawler programs. In a simple words any web-crawler program or bots visiting any website checks for the root file /robots.txt, which defined Exclusion Protocols for that bots. One of the common example of the robots.txt file is defined below

User-agent: *
Disallow: /

Where User-agent: defines the type of bots and Disallow defined the exclusion for particulars or type of url locations.
User-agent: * means all types of bots
Disallow: / means exclusion for all the files and pages located at websites.
Robots Tips Wordpress How to decide the robots.txt for Wordpress blogs

How to decide the robots.txt for Wordpress blogs

A common example of robots.txt file used at honeytechblog is listed as below

Sitemap: http://www.honeytechblog.com/sitemap.xml

User-Agent: *
Disallow: */?mobi*
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /wp-
Disallow: /*.css$
Disallow: */forums/bb-login.php?*
Allow: twitter.honeytechblog.com

User-agent: Googlebot-Image
disallow:

User-agent: Mediapartners-Google*
disallow:

Explanations of the common exclusions and agent used in the robots.txt files

1.Sitemap: http://www.honeytechblog.com/sitemap.xml

Used to define the sitemap location for the bots, these will create ease for search bots to detect your new pages.

2.User-Agent: *

Already described above

3.Disallow: */?mobi*

Used to exclude the pages containing “/?mobi”. I used this feature to avoid the content duplicacy issues generated for mobile users.( It is not necessary for you )

4.Disallow: /wp-admin/

Used to exclude the wordpress admin pages from the search engine. It is necessary to avoid the listing of any hack prone page or errors.

5.Disallow: /wp-includes/

Used to exclude the Wordpress includes folder which also necessary to avoid from the searching bots. It is necessary because some times when your Wordpress faces any plugins or update issues, it floats a serious errors which can be easily indexed by the search bots or hackers.

6.Disallow: /wp-content/

Again it is not necessary to index all the files in the wp-contents.

7.Disallow: /wp-

For security purpose its better hide all the core files and pages.

8.Disallow: /*.css$

For exclusion of all the style-sheets. (If you want to further protect your css files)
Note: Disallow: /*.”fileextension”$ can be used to exclude the “file extension” from the reach of bots. Where “file extension” can be any extensions you want like *.txt$,*.php$ ,*.jsp$, or *.jpg$

9.Disallow: /*?

Used to disallow all the urls having “?” in it. (Used to avoid content duplicacy issues, tracking urls and custom features from the reach of bots)

10.Disallow: /name/

Used to disallow any directory ,folders or categories. for example you want to disallow “admin” folder then you can simple use “Disallow: /admin/” , if you want to disallow a category named “download” then you can simply use “Disallow:/category/download*” and for uncategorized category use can use “Disallow: /category/uncategorized*”

Extra:

To allow all the images bots (like google image bot) to search and index all images of the website / blog

disallow:
Allow: /*.png$
Allow: /*.jpg$
Allow: /*.gif$
Allow: /*.jpeg$
Allow: /*.jpg$
Allow: /*.ico$
Allow: /images/

To allow all the adsense bot to crawler with ease on entire site

User-agent: Mediapartners-Google*
disallow:


You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackbackfrom your own site.

Subscribe to our FREE Rss Feed

Hot in Social Media


10 Ways Social Media Can Help Your Business

10 Best Social Media Case Studies

10 Tips To Become Social Without Using Social Media

Top 10 Tips To Enhance Personal Branding

What's Hot

Similar Interesting Posts

Tutorials On

2 Tweets

13 Responses to “How to decide the robots.txt for Wordpress blogs”

  1. How to decide the robots.txt for #Wordpress blogs http://bit.ly/4mH4R

  2. Tech How to decide the robots.txt for Wordpress blogs: Robots.txt is used to define the Robots Exclusion Pro.. http://bit.ly/2RYCt1

  3. I think you made a mistake in this post. Instead of specifying “To disallow all the adsense bot to crawler with ease on entire site”, you put “To allow all the adsense bot to crawler with ease on entire site”
    BlogrPro´s last blog ..BlogrPro is now more Faster and Fresher My ComLuv Profile

  4. @BlogrPro

    Why do i want to ban the adsense on the blog?
    No one want to ban the adsense bots unless and until they are not using it !

    In a simple words if you disallow the Mediapartners-Google, you cannot display adsense ads on your website !
    http://www.google.com/support/forum/p/AdSense/thread?tid=60b8df97e2074da7&hl=en

  5. Configure the robots.txt for #Wordpress blogs http://bit.ly/4mH4R

  6. Thanks for showing me this very useful info I really like it.

  7. Thanks for the extraordinary article Honeysingh. Looking for such a article.
    Pavan Somu´s last blog ..Best Monetizing Plugins For Your Blogs My ComLuv Profile

  8. Thanks for the tutorial, how could we know that in our blog which is to be indexed and which is not to be indexed.
    Vivek´s last blog ..MBA Colleges of Assam My ComLuv Profile

  9. @Vivek
    Disallow: */page url here > Ban the search engine indexing on particular “page url”

  10. How to decide the robots.txt for Wordpress blogs http://bit.ly/5splre #wordpress

  11. RT @djbhai: How to decide the robots.txt for Wordpress blogs http://bit.ly/5splre #wordpress

  12. How to decide the robots.txt for Wordpress blogs http://bit.ly/5splre #wordpress

    This comment was originally posted on Twitter

  13. RT @djbhai: How to decide the robots.txt for Wordpress blogs http://bit.ly/5splre #wordpress

    This comment was originally posted on Twitter

Leave a Reply

CommentLuv Enabled

Additional comments powered by BackType

    Follow @honeytech On Twitter

    Google Calendar Gets Smart Rescheduler http://bit.ly/dfvRlv
  • Subscribe For Tips


  • Top Fans Of The Day

  • Sponsors Zone

Hot Tags

browser open source How to WordPress Plugin Social free ubuntu Designs Social media google Web design mistakes Internet Linux iphone Firefox blogging Windows tips mobile