Ultimate Magento Robots.txt File Examples

robotsMagento

Extremely common question when it comes to eCommerce – and for that matter Magento SEO – is how a robots.txt file should look and what should be in it. For the purpose of this article, I decided to take all of our knowledge and experience, some sample robots.txt files from our clients sites and some examples from other industry leading Magento studios to try and figure out an ultimate Magento robots.txt file.

Robots.txt Magento

Please note that you should never just take some of these generic files and place it as your robots.txt file on your specific Magento store blindly. Every store has its own structure and almost in every case there’s a need to modify some of the robots.txt’s content to better fit the specific needs of your store’s URL structure and indexing priorities you have. Always ask your eCommerce consultants to edit the Robots.txt file for your specific case and double check that everything that should be indexable indeed is using Google Webmaster Tools robots.txt testing tool before you deploy it live.

Inchoo’s recommended Magento robots.txt boilerplate:

# Google Image Crawler Setup
User-agent: Googlebot-Image
Disallow:

# Crawlers Setup
User-agent: *

# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
#Disallow: /js/
#Disallow: /lib/
Disallow: /magento/
#Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/

# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
#Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /catalog/product/gallery/

# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

# Paths (no clean URLs)
#Disallow: /*.js$
#Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?SID=

As you can see, the file above allows image indexing for image search while disallowing some blank image pages as explained in a tutorial by my mate Drazen.

It prevents some of the folders that are usually unwanted in index for a common Magento online store setup.

Please note that it doesn’t disallow most of the sorting and pagination parameters as we assume you’ll solve them using rel prev next implementation and by adding meta “noindex, follow” to the rest of the sorting parameters. For more info why meta “noindex, follow” and not “noindex, nofollow” read this.

In some cases you might want to allow reviews to be indexed. In that case remove “Disallow: /review/” part from the robots.txt file.

UPDATE: Since a lot of people in the comments talked about javaScript and image blocking and didn’t read the instructions in this post carefully, I decided to edit the recommended robots.txt file. The one above now allows indexing of the same. You’ll also notice that the file now allows “/checkout/”. This is due to our new findings that it is beneficial to allow Google to see your checkout. Read more in this post.

Robots.txt examples from portfolio websites of some other top Magento agencies:

One from BlueAcorn:

User-agent: *
Disallow: /index.php/
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /customer/
Disallow: /checkout/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Allow: /media/catalog/product/
Disallow: /*.php$
Disallow: /skin/
Disallow: /catalog/product/view/

User-agent: Googlebot-Image
Disallow: /
Allow: /media/catalog/product/

Sitemap: http://example.com/sitemap/sitemap.xml

Here’s another one from BlueAcorn similar to our recommended robots.txt file but with a little twist:

# Crawlers Setup
User-agent: *
Crawl-delay: 10

# Allowable Index
Allow: /*?p=

Allow: /media/

# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
# Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/

# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/

# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

# Paths (no clean URLs)
Disallow: /*.js$
Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?p=*&
Disallow: /*?SID=

As you can see above, they allow ?p parameter but disallow it in case there’s another parameter used at the same time with the ?p. This approach is quite interesting as it allows the rel prev next implementation while disallowing lots of combinations with other attributes. I still prefer solving those issues through “noindex, follow” but this is not bad either.

Here is an example of robots.txt file, very similar to what we’re using, coming form Groove Commerce‘s portfolio:

# Groove Commerce Magento Robots.txt 05/2011
#
# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these “robots” where not to go on your site,
# you save bandwidth and server resources.
#
# This file will be ignored unless it is at the root of your host:
# Used: http://example.com/robots.txt
# Ignored: http://example.com/site/robots.txt
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html

# Website Sitemap
Sitemap: http://www.eckraus.com/sitemap.xml

# Crawlers Setup

# Directories
User-agent: *
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/
Disallow: /blog/

# Paths (clean URLs)
User-agent: *
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/

# Files
User-agent: *
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

# Paths (no clean URLs)
User-agent: *
Disallow: /*.js$
Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?p=*&
Disallow: /*?SID=

Here’s an example from Turnkeye‘s portfolio:

User-agent: *
Disallow: /*?
Disallow: /app/
Disallow: /catalog/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /customer/
Disallow: /downloader/
Disallow: /js/
Disallow: /lib/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /tag/
Disallow: /review/
Disallow: /var/

As you can see, most of the top Magento agencies have very similar approach when it comes to robots.txt. As I said in the beginning – always check with your consultants before blindly copy/pasting any of these codes on to your store.

If you need any help, we can do a Magento Website Assessment for your site.

Interested in hiring us?

Have a chat with us. You would be surprised how small changes can make your business even more successful.


About Toni Anicic

eCommerce Consultant

SEO. Professional gaming. Home-brewed beer. Magento Certified Solution Specialist.

Read more posts by Toni / Visit Toni's profile

22 comments

  1. Fetch and Render home. catagory and product pages through “Fetch as Google” in Webmaster Tools.

    This will show the javascript and CSS that is blocked by the above set ups.

    Add exceptions to the robots.txt (Allow: /xxx/yyy) as necessary

    1. Great post, but I would argue you should not include app, lib, var (and maybe a few others) in the robots.txt from a security standpoint. The less an attacker knows about your code layout the better. Since those URLs are not likely to be something Google would find/crawl, there is no reason to include them in the file at all.

  2. Does blocking the directories /skin and /js not automatically disallow all css and js files inside?
    I used the robots file above and used ‘Fetch and Render as Google’ in Webmasters and all the formatting, js, product images, sliders etc could not be rendered.
    Please correct me if I am wrong.
    Thank you

    1. That’s why I wrote “In case you’d like to allow Google to index your javaScript and CSS …” instructions in the post. Do not copy / paste these robots.txt files blindly, read the post carefully ;)

  3. I would highly suggest you do not use the 2 lines below if you want Google to be happy with your site.
    Disallow: /*.js$
    Disallow: /*.css$

    You can check with the major SEO sites as to why you don’t want to block .js and .css files from Google.
    They explain it better than I could in a comment.

  4. thanks for this post, help me a lot :D

    i have a question, i have a magento store and i see in various examples they disallow the folder media, in this case, any media will Not be crawled? no images of products in google for example?

    thanks

    (Disallow: /media/)

  5. @Technologia Geek

    This is for WordPress. Not Magento.

    People should pay attention every time time they use something from the Web.

  6. well for regular bloguer this is the best way to use..:

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-content/plugins/
    Disallow: /wp-content/cache/
    Disallow: /wp-content/themes/
    Allow: /wp-content/uploads/
    # Disallow: /tag/ # uncomment if you’re not using tags
    # Disallow: /category/ # uncomment if you’re not using categories
    # Disallow: /author/ # uncomment for single user blogs
    Disallow: /feed/
    Disallow: /trackback/
    # Disallow: /print/ # wp-print block
    Disallow: /2009/ # the year your blog was born
    Disallow: /2010/
    Disallow: /2011/
    Disallow: /2012/ # and so on
    Disallow: /index.php # separate directive for the main script file of WP
    Disallow: /*? # search results
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: */feed/
    Disallow: */trackback/
    # Disallow: */print/

    User-agent: Googlebot-Image
    Disallow:
    Allow: /

    User-agent: Mediapartners-Google
    Disallow:
    Allow: /

    Sitemap: http://yourdomain.com/sitemap.xml

  7. @Eng Siang and @MagePsycho:

    In theory, Google supports wildcard (*) when it comes to robots.txt although it’s kind of officially not supported by all crawlers.

    So, for example:

    Disallow: */catalogsearch/

    Should also disallow /en/catalogsearch/

    So that’s one way of doing it, the other way would be to add both Disallow: /catalogsearch/ and Disallow: /en/catalogsearch/ to your robots.txt file.

    Remember, you can always test the changes on your robots.txt file within Google Webmaster Tools before you deploy it live and see if it blocks what you intended to block or not.

  8. Hi Toni,

    if i don’t want to index particular page in Magento site,
    then what is the line add in robots.txt file

    for a single page block.

    Thanks,
    Manish

  9. You example needs a fix on line 44:

    Dissalow: /catalog/product/gallery/
    
    -to-
    
    Disallow: /catalog/product/gallery/

    Nice examples though, thanks.

  10. Very nice article Toni!

    I have a question on multiple languages implementation, if the multi-language stores are setup as subdirectory structure, for example:

    1st language: example.com/
    2nd language: example.com/en/
    3rd language: example.com/de/

    Do we need to include the subdirectory in the robots.txt? (Understand that the robots.txt should only be placed on the root directory of the domain)

    Disallow: /catalog/product_compare/
    Disallow: /catalog/category/view/
    Disallow: /catalog/product/view/
    Disallow: /catalogsearch/
    Disallow: /en/catalog/product_compare/
    Disallow: /en/catalog/category/view/
    Disallow: /en/catalog/product/view/
    Disallow: /en/catalogsearch/
    Disallow: /de/catalog/product_compare/
    Disallow: /de/catalog/category/view/
    Disallow: /de/catalog/product/view/
    Disallow: /de/catalogsearch/

  11. Great article Toni :). This has helped shed some light on recent robots.txt issues I’m having with review urls.

    @James:
    I think hiding /reviews/ might be good in the case you move your reviews to your actual product page since the default reviews page is pretty much the product page but without the description. Our setup is still pretty “defaultish” and I think I will eventually change our description/reviews for a product, to a tab based layout under the product. That way it cuts down on the number of duplicated pages, and as you mentioned, puts more SEO into the product page itself.

  12. One of the most common issue i have seen in various Magento stores is that they don’t prevent search engine crawlers to crawl and index their subversion or Git directories. Moreover, i have also seen downloadable .PDF listed when you search site it Google.

    This tells, if you do not configure Robots.txt carefully you will end up seeing lots of unwanted stuff in Google too.

    Pretty nice article Toni, thanks for sharing.

  13. Any reason why you say you shouldn’t allow Reviews, as good reviews help with SEO on Google, as they can index the star rating system within Magento.

    This also allows for bespoke content to be placed by customers, the more reviews the better as Google sees this as unique content.

  14. excellent post, we are having some issues with the bing bot, they said we are blocking the images, I know google is the big search engine but bing and yahoo have much netter conversion in our case and we would like to know if you have any suggestions to optimize the robot file for yahoo/bing, thank you!

    J.

  15. Hi Toni,

    thank you for your post.
    Why isn’t /*?dir=* oir /*?limit=*& disalowed?

    Regards

    Sebastian

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <blockquote cite=""> <code> <del datetime=""> <em> <strike> <strong>. You may use following syntax for source code: <pre><code>$current = "Inchoo";</code></pre>.