Ultimate Magento Robots.txt File Examples

By: Toni Anicic, Last updated: Aug 29, 2023 | Published: Feb 12, 2013

Extremely common question when it comes to eCommerce – and for that matter Magento SEO – is how a robots.txt file should look and what should be in it. For the purpose of this article, I decided to take all of our knowledge and experience, some sample robots.txt files from our clients sites and some examples from other industry leading Magento studios to try and figure out an ultimate Magento robots.txt file.

Please note that you should never just take some of these generic files and place it as your robots.txt file on your specific Magento store blindly. Every store has its own structure and almost in every case there’s a need to modify some of the robots.txt’s content to better fit the specific needs of your store’s URL structure and indexing priorities you have. Always ask your eCommerce consultants to edit the Robots.txt file for your specific case and double check that everything that should be indexable indeed is using Google Webmaster Tools robots.txt testing tool before you deploy it live.

Inchoo’s recommended Magento robots.txt boilerplate:

# Google Image Crawler Setup
User-agent: Googlebot-Image
Disallow:
# Crawlers Setup
User-agent: *
# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
#Disallow: /js/
#Disallow: /lib/
Disallow: /magento/
#Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/
# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
#Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /catalog/product/gallery/
# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
# Paths (no clean URLs)
#Disallow: /*.js$
#Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?SID=

As you can see, the file above allows image indexing for image search while disallowing some blank image pages as explained in a tutorial by my mate Drazen.

It prevents some of the folders that are usually unwanted in index for a common Magento online store setup.

Please note that it doesn’t disallow most of the sorting and pagination parameters as we assume you’ll solve them using rel prev next implementation and by adding meta “noindex, follow” to the rest of the sorting parameters. For more info why meta “noindex, follow” and not “noindex, nofollow” read this.

In some cases you might want to allow reviews to be indexed. In that case remove “Disallow: /review/” part from the robots.txt file.

UPDATE: Since a lot of people in the comments talked about javaScript and image blocking and didn’t read the instructions in this post carefully, I decided to edit the recommended robots.txt file. The one above now allows indexing of the same. You’ll also notice that the file now allows “/checkout/”. This is due to our new findings that it is beneficial to allow Google to see your checkout. Read more in this post.

Robots.txt examples from portfolio websites of some other top Magento agencies:

One from BlueAcorn:

User-agent: *
Disallow: /index.php/
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /customer/
Disallow: /checkout/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Allow: /media/catalog/product/
Disallow: /*.php$
Disallow: /skin/
Disallow: /catalog/product/view/
User-agent: Googlebot-Image
Disallow: /
Allow: /media/catalog/product/
Sitemap: http://example.com/sitemap/sitemap.xml

Here’s another one from BlueAcorn similar to our recommended robots.txt file but with a little twist:

# Crawlers Setup
User-agent: *
Crawl-delay: 10
# Allowable Index
Allow: /*?p=
Allow: /media/
# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
# Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/
# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
# Paths (no clean URLs)
Disallow: /*.js$
Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?p=*&
Disallow: /*?SID=

As you can see above, they allow ?p parameter but disallow it in case there’s another parameter used at the same time with the ?p. This approach is quite interesting as it allows the rel prev next implementation while disallowing lots of combinations with other attributes. I still prefer solving those issues through “noindex, follow” but this is not bad either.

Here is an example of robots.txt file, very similar to what we’re using, coming form Groove Commerce‘s portfolio:

# Groove Commerce Magento Robots.txt 05/2011
#
# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these “robots” where not to go on your site,
# you save bandwidth and server resources.
#
# This file will be ignored unless it is at the root of your host:
# Used: http://example.com/robots.txt
# Ignored: http://example.com/site/robots.txt
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html
# Website Sitemap
Sitemap: http://www.eckraus.com/sitemap.xml
# Crawlers Setup
# Directories
User-agent: *
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/
Disallow: /blog/
# Paths (clean URLs)
User-agent: *
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
# Files
User-agent: *
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
# Paths (no clean URLs)
User-agent: *
Disallow: /*.js$
Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?p=*&
Disallow: /*?SID=

Here’s an example from Astrio‘s portfolio:

User-agent: *
Disallow: /*?
Disallow: /app/
Disallow: /catalog/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /customer/
Disallow: /downloader/
Disallow: /js/
Disallow: /lib/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /tag/
Disallow: /review/
Disallow: /var/

As you can see, most of the top Magento agencies have very similar approach when it comes to robots.txt. As I said in the beginning – always check with your consultants before blindly copy/pasting any of these codes on to your store.

If you need any help, we can do a Magento SEO Audit for your site.

Related Inchoo Services

eCommerce UX Audit

Published in:

72 comments

Adam Durrant says:
May 09, 2020 at 18:53
I just found a client with this exact robots file.
It’s mostly awful/useless
Blocking your JS & CSS = not a good idea
Blocking .php & txt files = complete waste of time
Handling paramaters in robots = inefficient and is generally not good practice
Reply
M.Usama says:
Oct 21, 2019 at 11:42
I want to know should i add only
Disallow: /catalog/
If i disallow only catalog so is there any indexing problem happen to my products?
Reply
Abraham says:
Oct 30, 2018 at 17:50
# Allowable Index
Allow: /*?p=
# Paths (no clean URLs)
Disallow: /*?p=*&
I do not understand this. It is supposed to be contradictory. It is right?
thanks, kind regards!
Reply
1. Toni Anicic says:
  Oct 31, 2018 at 8:08
  Hi Abraham,
  This part tells search engines to allow indexing of pagination:
  # Allowable Index
  Allow: /*?p=
  This part tells search engines not to allow indexing of pagination when another parameter is present besides p:
  # Paths (no clean URLs)
  Disallow: /*?p=*&
  So this is fine. It would allow search engines to index your pagination and get to pages beyond page 1 but disallow them to do the same with different sorting or other parameters so it prevents bloating the index with near duplicates (thin content).
2. Abraham says:
  Oct 31, 2018 at 12:20
  Hi toni again, thank you, I had not looked well. But on the thread of this theme. What would you do, index the different pages or not?
  I do not see much useful indexing pages of the same category. Thank you for your point of view.
  Kind regards
tejas khamar says:
Sep 29, 2017 at 7:58
every day 20-25 page display google crwal error (Soft 404 is occurring). any body know any permanently solution.
Reply
Francis Kim says:
Apr 11, 2017 at 10:32
Great resource!
Reply
Hossein says:
Mar 08, 2017 at 7:18
we have some crawl error in Google search console relative to Disallow /skin commend.
is it necessary to use this commdend? what is the effect of this?
Reply
Gabriel says:
Nov 17, 2016 at 22:16
Hi Toni and Team,
Has the boilerplate robots.txt file in this article been updated for Magento 2.0 / 2.1? I’ve read that the robots.txt file should be different for the new version of Magento, but I’d like to get another opinion. Interestingly the Magento 2.1. default robots.txt parameters seem very incomplete compared to the file displayed in your article…
Thanks for much for sharing this information!
Gabriel
Reply
1. Toni Anicic says:
  Nov 18, 2016 at 8:27
  This is boilerplate for Magento 1.X. Magento 2 one should be slightly different. We’ll do our best to publish a boilerplate soon for Magento 2, however, we’re still gathering data and observing new ideas and best practices as there are very few Magento 2 websites in live environment so best practices are still being formed.
Robin says:
Aug 03, 2016 at 15:33
Hi Toni,
Thank-you for the article.
I am looking to exclude the option items, example: ” Disallow: /*price= ”
I assume this would work for http://www.mysite.com/category/?price=10
If the configuration is set to http://www.mysite.com/category.html?price= , do you know how the robots entry should be modified?
Would it simply be ” Disallow: *price= ” or do I need to make refernce to the .html part?
Help much appreciated as I know lots of Magento sites are set-up to use the .html extension.
Reply
Chris says:
Apr 12, 2016 at 7:24
Hi, I cant seem to get this robot rule to block tons of internal links, they still show allowed when I use user agent* and non * Any ideas? thanks
add_review.php?
Reply
1. Toni Anicic says:
  Apr 12, 2016 at 7:32
  I’m unsure without having a look at your specific case. If you need expert help I advise you to use our SEO audit services.
Christoph says:
Apr 08, 2016 at 21:08
You allow Googlebot-Image. But what’s about MSNBot-Media ? Others which whould be allowed for media also?
Reply
1. Toni Anicic says:
  Apr 12, 2016 at 7:28
  We generally work in markets where those bots would be negligible, but if you need them, sure, add them as well.
Hitesh Agarwal says:
Apr 04, 2016 at 8:12
Hi Toni,
My store has more than 1000 ‘Out Of Stock’ SKU’s on a daily basis. How do I remove these from Google’s index? Is there a way to automatically update robots.txt with these products?
Reply
1. Toni Anicic says:
  Apr 04, 2016 at 12:50
  I would not recommend adding out of stock products to robots.txt disallow.
  Out of stock products should either remain a status 200 page with a correct out of stock schema markup or if there are way too many of them just 404 them after some time or 410 if you’re sure they’re not coming back to stock ever again.
Zipship says:
Mar 22, 2016 at 12:48
I have a question i have update my robots.txt according to as said above. But when I check in google webmaster tool it is showing a error on the Crawl-delay: 10. Why so.. Do it will affect my ranking and indexing..
Reply
1. Toni Anicic says:
  Mar 22, 2016 at 12:57
  It probably shows just a warning “Rule ignored by Googlebot”. That’s because Googlebot doesn’t follow the crawl-delay robots.txt directive.
  Also, we did not recommend adding the crawl-delay in our recommended robots.txt boilerplate for Magento above.
Sridhar says:
Mar 07, 2016 at 11:30
This is very useful for us. Thanks for creating this robots.txt file. Every time I copy from here to work on my Client’s website. Thanks
Reply
Dane Jobs says:
Feb 23, 2016 at 9:34
So please help me, for the newly- launched site, can we use BlueAcorn robot.txt ?
Or which should we have to apply here ?
Thanks.
Dane
Reply
1. Toni Anicic says:
  Feb 24, 2016 at 8:43
  None of these robots.txt files should be used out of the box. They are boilerplates. You need to modify them to fit your specific needs.
Murshid says:
Dec 14, 2015 at 0:36
Thanks for this guide. I came here as MOZ recommend your article for robots.txt. Its work fine for my magento website.
-Murshid
Reply
Erez Kanaan says:
Nov 20, 2015 at 20:51
What about google mobile bot requesting access to js and css files?
Reply
1. Toni Anicic says:
  Nov 23, 2015 at 9:27
  We allow it. Notice there’s # sign in front of the lines that would block js and css. This is a commented line to show people reading this that we intentionally didn’t disallow it.
2. Paul says:
  Dec 08, 2015 at 17:45
  You would also need to allow /skin/, without this google cannot render the page correctly and complains about mobile-friendly issues.
Vladimir says:
Nov 06, 2015 at 14:49
Hey! Thank you a lot for a great article, Toni! It is really helpful!
I have a small question: if you implement a WordPress installation into magento – and WP-folder is in main directory – “wordpress”, what should we do?
1. Should we disallow whole WP directory?
Disallow: /wordpress/
2. Should we Disallow some special files/folders as for example
Disallow: /wordpress/wp-admin/
Disallow: /wordpress/wp-includes/
Allow: /wordpress//wp-content/uploads/ and so on as for WP sites
3. Or should we add 1 more robots.txt into wordpress folder?
Reply
1. Toni Anicic says:
  Nov 09, 2015 at 15:09
  Hi Vladimir,
  In most cases you don’t really need to add any new commands into robots.txt file for a WordPress installation.
  In most WordPress set-ups you really need to handle SEO issues through the use of a WordPress SEO plugin (I recommend the Yoast one) and not by blocking resources via robots.txt file.
2. Vladimir says:
  Nov 10, 2015 at 13:12
  Thank you, Toni, for a fast and great advise!
Javier Villanueva says:
Sep 23, 2015 at 16:14
Anyone know if it’s OK to disallow the admin login page? I don’t see it on any of the robots.txt files posted
Reply
Hector says:
Aug 07, 2015 at 15:20
Great post-Thanks! I just can’t seem to find the answer to my question anywhere, which is the following:
What exactly does the asterisk (*) mean? what does it do? for example if I use Disallow: /*?_a=*
Does the * tell bots to disallow EVERYTHING before the * and everything after it? what if I DON’T put the * right after the slash or at the end?
Or if I’m using this – Disallow: /*product-reviews.asp?product=*
would it also work if I used this – Disallow: /*?product=*
Or even this – Disallow: *.asp$
Any help would be greatly appreciated!
Reply
1. Toni Anicic says:
  Nov 09, 2015 at 15:12
  Hi Hector,
  Sorry for late response.
  You said:
  Or if I’m using this – Disallow: /*product-reviews.asp?product=*
  would it also work if I used this – Disallow: /*?product=*
  “Disallow: /*?product=*” would indeed block URLs such as “Disallow: /*product-reviews.asp?product=*”.
  Verify your website withing Google’s Search Console and you’ll be able to use test robots.txt tool that will help you understand what is and what isn’t blocked with a certain robots.txt markup.
Andre V. says:
Aug 06, 2015 at 20:00
Wouldn’t it be ideal for the robots.txt file to not block the css and js folder (ensuring that they are able to index the mobile version of the site)?
Reply
1. Andre V. says:
  Aug 06, 2015 at 20:02
  Sorry – I just saw Emil’s comment… so the /js and /css folder should be allowed access..
Oyuncak says:
Jul 28, 2015 at 18:56
Disallow /skin/ not accessing js css file…
Reply
Brendan says:
Jun 02, 2015 at 15:46
I assume disallowing /catalog/category/view and /catalog/product/view are used to combat duplicate content, but in the Magento configuration, we can add rel canonical link elements to point this to the clean URL’s. Is there another other benefit of disallowing these URL’s?
Reply
john roots says:
May 21, 2015 at 9:07
Hi, My Magento installation is having 6 different domains running from same admin. So do we need different robots.txt for different websites.
Reply
1. Toni Anicic says:
  May 21, 2015 at 9:11
  Hi John,
  Yes, each domain needs its own robots.txt file. Even subdomains use their own robots.txt file (something.example.com uses something.example.com/robots.txt not example.com/robots.txt).
Emil says:
May 18, 2015 at 22:24
hi, I have blocked js. , Skin, Css, but today I found that Google needs them to render correctly!
https://support.google.com/webmasters/answer/6153277
Now I am confused! What is correct? To dissallow Google to render js. , Skin, Css from my site ? Or not?
Best regards
Emil
Reply
1. Toni Anicic says:
  May 19, 2015 at 9:12
  Hi Emil,
  As you can see from our recommended robots.txt file we do not disallow files needed for search engines to properly render your website. Especially if you have a responsive website.
Alex Meade says:
Apr 28, 2015 at 18:15
Guys,
I just updated to CE 1.9.1 – my store is created, live , indexed. I do not have a current robots.txt
We are a start up with very limited money, so I have to figure this out myself. I am gettting alot of duplicate content reported on my SEO reports(MOZ, google webmaster tools, etc) It looks like magento search pages and such. im glad I stumbled onto this blog through google search. Could anyone please suggest if one of these examples will work for my store? I would be ever grateful and appreciate so much some direction on this.
My store
http://www.iamgreenminded.com
Magneto CE 1.9.1 With theme “Ultimo – Fluid Responsive Magento Theme” 1.13
thank you!
alex
Reply
Joe Gans says:
Apr 16, 2015 at 22:47
I uploaded the robots.txt file this morning, overwriting my old one, but the new configuration is still showing the old file? Do I need to reboot the server for the new file to be accepted? … Thanks, Joe
Reply
1. Toni Anicic says:
  Apr 17, 2015 at 8:39
  Hi Joe,
  It shouldn’t be necessary to restart the web server.
  When you visit your website’s robots.txt file from browser, do you see a new one or the old one?
  If you see the new one then everything is fine, robots.txt tester in Google Webmaster Tools has a cache of your old one and will change to new one in up to few days.
  If you see the old one, make sure you actually did rewrite it with the new one. You might need to clear some caches you’re using or if you’re using CDN force the CDN to fetch the new robots.txt file if it’s served from over there.
Wayne says:
Apr 14, 2015 at 11:08
Hi Toni, thanks for the article.
May I ask why you added hash tags before certain disallow lines? Doesn’t this render these particular lines useless?
Regards
Wayne
Reply
1. Toni Anicic says:
  Apr 14, 2015 at 11:22
  Hi Wayne,
  The comments (#) are added for 2 reasons:
  1. So people reading the article could see the difference between versions and what we removed.
  2. So that we would stress out to people NOT to disallow those paths.
  You can off course remove them in your robots.txt file, they don’t serve any use to search engines.
Sushant Peshkar says:
Mar 18, 2015 at 8:45
Hi Toni,
I have concern regarding images are not indexing in Google for one of my clients. They have their all images on sub-domain for example.
site is: WXYZ.com
and Images are on
media.WXYZ.com
Can you please help me finding solution on this.
Reply
1. Toni Anicic says:
  Mar 20, 2015 at 9:14
  Subdomains have their own robots.txt file in their root. So example.com uses example.com/robots.txt while subdomain.example.com uses subdomain.example.com/robots.txt
Samiran says:
Feb 17, 2015 at 14:28
Would like put search option for cloth category Jeans. To implement this changes what changes required to do in Magento.
Reply
Kratomgardens says:
Dec 03, 2014 at 13:46
Thank you for this Toni. Very clear explanation
Reply
John Morgan says:
Nov 27, 2014 at 3:24
Fetch and Render home. catagory and product pages through “Fetch as Google” in Webmaster Tools.
This will show the javascript and CSS that is blocked by the above set ups.
Add exceptions to the robots.txt (Allow: /xxx/yyy) as necessary
Reply
Toni Anicic says:
Nov 06, 2014 at 9:42
Post updated, our recommended robots.txt updated with latest best practices.
Reply
1. Joe Constant says:
  Nov 27, 2014 at 2:17
  Great post, but I would argue you should not include app, lib, var (and maybe a few others) in the robots.txt from a security standpoint. The less an attacker knows about your code layout the better. Since those URLs are not likely to be something Google would find/crawl, there is no reason to include them in the file at all.
Dennis says:
Oct 30, 2014 at 22:33
Dissalow: /catalog/product/gallery/
should be: Disallow: /catalog/product/gallery/
Reply
advis says:
Sep 28, 2014 at 18:50
Does blocking the directories /skin and /js not automatically disallow all css and js files inside?
I used the robots file above and used ‘Fetch and Render as Google’ in Webmasters and all the formatting, js, product images, sliders etc could not be rendered.
Please correct me if I am wrong.
Thank you
Reply
1. Toni Anicic says:
  Nov 06, 2014 at 9:33
  That’s why I wrote “In case you’d like to allow Google to index your javaScript and CSS …” instructions in the post. Do not copy / paste these robots.txt files blindly, read the post carefully 😉
B. Moore says:
Sep 24, 2014 at 6:24
I would highly suggest you do not use the 2 lines below if you want Google to be happy with your site.
Disallow: /*.js$
Disallow: /*.css$
You can check with the major SEO sites as to why you don’t want to block .js and .css files from Google.
They explain it better than I could in a comment.
Reply
Bastian says:
Jul 22, 2014 at 18:59
thanks for this post, help me a lot 😀
i have a question, i have a magento store and i see in various examples they disallow the folder media, in this case, any media will Not be crawled? no images of products in google for example?
thanks
(Disallow: /media/)
Reply
Mark says:
Jul 04, 2014 at 13:49
I would be very careful when blocking .js and .css on any system. Only do so if you fully understand or test the affects it will have on your SEO.
I’ve wrote a short post about it here… more aimed at lower level than developers, but still worth noting http://www.pixeldistribution.co.uk/blog/website-marketing/robots-txt-big-mistake-blocking-css-and-js
Reply
Justin says:
May 12, 2014 at 16:11
@Technologia Geek
This is for WordPress. Not Magento.
People should pay attention every time time they use something from the Web.
Reply
Tecnologia Geek says:
Apr 27, 2014 at 4:29
well for regular bloguer this is the best way to use..:
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-content/uploads/
# Disallow: /tag/ # uncomment if you’re not using tags
# Disallow: /category/ # uncomment if you’re not using categories
# Disallow: /author/ # uncomment for single user blogs
Disallow: /feed/
Disallow: /trackback/
# Disallow: /print/ # wp-print block
Disallow: /2009/ # the year your blog was born
Disallow: /2010/
Disallow: /2011/
Disallow: /2012/ # and so on
Disallow: /index.php # separate directive for the main script file of WP
Disallow: /*? # search results
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: */feed/
Disallow: */trackback/
# Disallow: */print/
User-agent: Googlebot-Image
Disallow:
Allow: /
User-agent: Mediapartners-Google
Disallow:
Allow: /
Sitemap: http://yourdomain.com/sitemap.xml
Reply
Toni Anicic says:
Apr 04, 2014 at 8:42
@Eng Siang and @MagePsycho:
In theory, Google supports wildcard (*) when it comes to robots.txt although it’s kind of officially not supported by all crawlers.
So, for example:
Disallow: */catalogsearch/
Should also disallow /en/catalogsearch/
So that’s one way of doing it, the other way would be to add both Disallow: /catalogsearch/ and Disallow: /en/catalogsearch/ to your robots.txt file.
Remember, you can always test the changes on your robots.txt file within Google Webmaster Tools before you deploy it live and see if it blocks what you intended to block or not.
Reply
MagePsycho says:
Apr 03, 2014 at 8:12
I have also the same query as of @Eng Siang
How to deal with multi stores in sub-directories?
Reply
Rawiri says:
Feb 15, 2014 at 2:01
Very good article, thanks so much.
Reply
manish says:
Feb 11, 2014 at 12:39
Hi Toni,
if i don’t want to index particular page in Magento site,
then what is the line add in robots.txt file
for a single page block.
Thanks,
Manish
Reply
Steve Walker says:
Jan 19, 2014 at 8:39
You example needs a fix on line 44:
```
Dissalow: /catalog/product/gallery/

-to-

Disallow: /catalog/product/gallery/
```
Nice examples though, thanks.
Reply
Eng Siang says:
Jan 02, 2014 at 5:13
Very nice article Toni!
I have a question on multiple languages implementation, if the multi-language stores are setup as subdirectory structure, for example:
1st language: example.com/
2nd language: example.com/en/
3rd language: example.com/de/
Do we need to include the subdirectory in the robots.txt? (Understand that the robots.txt should only be placed on the root directory of the domain)
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /en/catalog/product_compare/
Disallow: /en/catalog/category/view/
Disallow: /en/catalog/product/view/
Disallow: /en/catalogsearch/
Disallow: /de/catalog/product_compare/
Disallow: /de/catalog/category/view/
Disallow: /de/catalog/product/view/
Disallow: /de/catalogsearch/
Reply
1. kris says:
  Apr 29, 2015 at 19:12
  Yes we are also trying to find the answer to this. Does anyone know?
Garlon says:
Sep 20, 2013 at 16:18
Great article Toni :). This has helped shed some light on recent robots.txt issues I’m having with review urls.
@James:
I think hiding /reviews/ might be good in the case you move your reviews to your actual product page since the default reviews page is pretty much the product page but without the description. Our setup is still pretty “defaultish” and I think I will eventually change our description/reviews for a product, to a tab based layout under the product. That way it cuts down on the number of duplicated pages, and as you mentioned, puts more SEO into the product page itself.
Reply
Ashish says:
Aug 19, 2013 at 7:53
One of the most common issue i have seen in various Magento stores is that they don’t prevent search engine crawlers to crawl and index their subversion or Git directories. Moreover, i have also seen downloadable .PDF listed when you search site it Google.
This tells, if you do not configure Robots.txt carefully you will end up seeing lots of unwanted stuff in Google too.
Pretty nice article Toni, thanks for sharing.
Reply
James Simpson says:
Jul 17, 2013 at 15:43
Any reason why you say you shouldn’t allow Reviews, as good reviews help with SEO on Google, as they can index the star rating system within Magento.
This also allows for bespoke content to be placed by customers, the more reviews the better as Google sees this as unique content.
Reply
Jenny says:
Jul 11, 2013 at 0:17
excellent post, we are having some issues with the bing bot, they said we are blocking the images, I know google is the big search engine but bing and yahoo have much netter conversion in our case and we would like to know if you have any suggestions to optimize the robot file for yahoo/bing, thank you!
J.
Reply
Sebas says:
Apr 24, 2013 at 9:31
Hi Toni,
thank you for your post.
Why isn’t /*?dir=* oir /*?limit=*& disalowed?
Regards
Sebastian
Reply

Related Inchoo Services

You made it all the way down here so you must have enjoyed this post! You may also like:

Editing robots.txt in Magento 2 Admin

Magento & GWT “Googlebot found an extremely high number of URLs on your site”

Google Image indexing and Magento [robots.txt]

72 comments

Leave a Reply Cancel reply