Home | Contact | Videos | MediaZo

How to decide the robots.txt for WordPress blogs

Robots.txt is used to define the Robots Exclusion Protocols for the websites. It handles the behaviors of all the robots, bots and web-crawler programs. In a simple words any web-crawler program or bots visiting any website checks for the root file /robots.txt, which defined Exclusion Protocols for that bots. One of the common example of the robots.txt file is defined below

User-agent: *
Disallow: /

Where User-agent: defines the type of bots and Disallow defined the exclusion for particulars or type of url locations.
User-agent: * means all types of bots
Disallow: / means exclusion for all the files and pages located at websites.
Robots Tips Wordpress How to decide the robots.txt for Wordpress blogs

How to decide the robots.txt for WordPress blogs

A common example of robots.txt file used at honeytechblog is listed as below

Sitemap: http://www.honeytechblog.com/sitemap.xml

User-Agent: *
Disallow: */?mobi*
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /wp-
Disallow: /*.css$
Disallow: */forums/bb-login.php?*
Allow: twitter.honeytechblog.com

User-agent: Googlebot-Image
disallow:

User-agent: Mediapartners-Google*
disallow:

Explanations of the common exclusions and agent used in the robots.txt files

1.Sitemap: http://www.honeytechblog.com/sitemap.xml

Used to define the sitemap location for the bots, these will create ease for search bots to detect your new pages.

2.User-Agent: *

Already described above

3.Disallow: */?mobi*

Used to exclude the pages containing “/?mobi”. I used this feature to avoid the content duplicacy issues generated for mobile users.( It is not necessary for you )

4.Disallow: /wp-admin/

Used to exclude the wordpress admin pages from the search engine. It is necessary to avoid the listing of any hack prone page or errors.

5.Disallow: /wp-includes/

Used to exclude the WordPress includes folder which also necessary to avoid from the searching bots. It is necessary because some times when your WordPress faces any plugins or update issues, it floats a serious errors which can be easily indexed by the search bots or hackers.

6.Disallow: /wp-content/

Again it is not necessary to index all the files in the wp-contents.

7.Disallow: /wp-

For security purpose its better hide all the core files and pages.

8.Disallow: /*.css$

For exclusion of all the style-sheets. (If you want to further protect your css files)
Note: Disallow: /*.”fileextension”$ can be used to exclude the “file extension” from the reach of bots. Where “file extension” can be any extensions you want like *.txt$,*.php$ ,*.jsp$, or *.jpg$

9.Disallow: /*?

Used to disallow all the urls having “?” in it. (Used to avoid content duplicacy issues, tracking urls and custom features from the reach of bots)

10.Disallow: /name/

Used to disallow any directory ,folders or categories. for example you want to disallow “admin” folder then you can simple use “Disallow: /admin/” , if you want to disallow a category named “download” then you can simply use “Disallow:/category/download*” and for uncategorized category use can use “Disallow: /category/uncategorized*”

Extra:

To allow all the images bots (like google image bot) to search and index all images of the website / blog

disallow:
Allow: /*.png$
Allow: /*.jpg$
Allow: /*.gif$
Allow: /*.jpeg$
Allow: /*.jpg$
Allow: /*.ico$
Allow: /images/

To allow all the adsense bot to crawler with ease on entire site

User-agent: Mediapartners-Google*
disallow:


Honey Singh is Head of Operations & Strategic Planning at MediaZo. He has an experience of 5 years in designing, ideating/operating digital/social marketing strategies for global brands. You can also catch him on Foursquare, Twitter, Facebook , Google+ & Instagram.

12 Responses to “How to decide the robots.txt for WordPress blogs”

  1. How to decide the robots.txt for #Wordpress blogs http://bit.ly/4mH4R

  2. I think you made a mistake in this post. Instead of specifying “To disallow all the adsense bot to crawler with ease on entire site”, you put “To allow all the adsense bot to crawler with ease on entire site”
    .-= BlogrPro´s last blog ..BlogrPro is now more Faster and Fresher =-.

  3. @BlogrPro

    Why do i want to ban the adsense on the blog?
    No one want to ban the adsense bots unless and until they are not using it !

    In a simple words if you disallow the Mediapartners-Google, you cannot display adsense ads on your website !
    http://www.google.com/support/forum/p/AdSense/thread?tid=60b8df97e2074da7&hl=en

  4. Configure the robots.txt for #Wordpress blogs http://bit.ly/4mH4R

  5. Thanks for showing me this very useful info I really like it.

  6. Thanks for the extraordinary article Honeysingh. Looking for such a article.
    .-= Pavan Somu´s last blog ..Best Monetizing Plugins For Your Blogs =-.

  7. Thanks for the tutorial, how could we know that in our blog which is to be indexed and which is not to be indexed.
    .-= Vivek´s last blog ..MBA Colleges of Assam =-.

  8. @Vivek
    Disallow: */page url here > Ban the search engine indexing on particular “page url”

  9. How to decide the robots.txt for WordPress blogs http://bit.ly/5splre #wordpress

  10. RT @djbhai: How to decide the robots.txt for WordPress blogs http://bit.ly/5splre #wordpress

  11. Hi, thank you for this useful information.

    Best Regards,

    Jessica

  12. Monetizing websites, blogs, etc is a good way to earn some passive income.~;.

Leave a Reply


× two = fourteen