Google Image indexing and Magento [robots.txt]

2012_12_04_googlebot-image-index_feat_609

There are many Magento robots.txt templates floating around, with tens of instructions for crawlers. Most of them deal with preventing crawlers to access some of many Magento’s directories. But sometimes these instructions can be too restrictive and might prevent Google from indexing your images.


If that’s your intention – well, you can stop reading this article right now.  But, before doing so try to  consider Google Image search as supplementary engine to Google search. Maybe that’s the train you wouldn’t want to miss, right?! :)

 

First of all, let’s start with robots.txt basics. Two basic instructions are:

User-agent: 

Disallow:

 

First one tells us that  instructions that follow are in relation only to that particular User-agent.  “User-agent: *” tells the robot that the rules listed below apply to all robots, but if you have “User-agent: googlebot” – Google’s crawler with take into account only instructions under User-agent: googlebot. You may have one or more User-agent definitions (groups of instructions), but keep in mind that they don’t add up (each group is on it’s own), and more importantly – more specific User-agent definition takes precedence over wildcard.

 

Googlebot vs Googlebot-image

Google has few different bots/crawlers. One for standard pages, and then others for mobile pages, AdSense (this one crawls pages with AdSense code and defines the context of the page (and other actions related to ad publishing)), images, etc. See full list of Google bots.

 

This gives us flexibility with creating instructions for crawlers in robots.txt. It’s not uncommon to have Magento robots.txt that, among others, has instruction:

Disallow: /media/  

This particular instruction blocks all crawlers from accessing everything that’s under /media/ subfolder – including images!

 

While it’s not needed to crawl content pages under /media/ (and we advise you not to let Googlebot crawl it – it may bring errors in your GWT report), it’s useful to let Googlebot-Image crawl your /media/ subfolder (and all other  folders with images related to your site). And this is how to do it (using /media/ folder as reference, but can be applied to any folder):

 

If you want your whole site to be indexed by Googlebot-Image, and prevent /media/ folder for all other crawlers – put this in robots.txt:

 

# Google Image Crawler Setup

User-agent: Googlebot-Image

Disallow: 

# all other crawlers 

User-agent: *

Disallow: /media/

+[other instructions]

 

If you test this instructions in Google Webmasters Tools, you’ll get this:

Interested in hiring us?

Have a chat with us. You would be surprised how small changes can make your business even more successful.


About Drazen Karacic-Soljic

eCommerce Consultant

Cacan is an eCommerce Consultant at Inchoo, where he's monitoring key performance indicators of client's online store and suggesting improvements.

Read more posts by Drazen / Visit Drazen's profile

3 comments

  1. What about Bing Bot? You are blocking bing from the media folder I noticed I am getting a product data feed error in bing merchant because bing bot is blocked from the media folder. How would you handle that situation?

  2. Hi Sir,

    i want to prevent from index particular pages then

    Disallow: /xyzon.com/btw-formulier365

    Above instruction will work in robots.txt or not

    it’s Magento Site, if i go to apply from back end then its apply for all pages…

    please help me on this.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <blockquote cite=""> <code> <del datetime=""> <em> <strike> <strong>. You may use following syntax for source code: <pre><code>$current = "Inchoo";</code></pre>.