Google Image indexing and Magento [robots.txt]

2012_12_04_googlebot-image-index_feat_609

There are many Magento robots.txt templates floating around, with tens of instructions for crawlers. Most of them deal with preventing crawlers to access some of many Magento’s directories. But sometimes these instructions can be too restrictive and might prevent Google from indexing your images.


If that’s your intention – well, you can stop reading this article right now.  But, before doing so try to  consider Google Image search as supplementary engine to Google search. Maybe that’s the train you wouldn’t want to miss, right?! :)

 

First of all, let’s start with robots.txt basics. Two basic instructions are:

User-agent: 

Disallow:

 

First one tells us that  instructions that follow are in relation only to that particular User-agent.  “User-agent: *” tells the robot that the rules listed below apply to all robots, but if you have “User-agent: googlebot” – Google’s crawler with take into account only instructions under User-agent: googlebot. You may have one or more User-agent definitions (groups of instructions), but keep in mind that they don’t add up (each group is on it’s own), and more importantly – more specific User-agent definition takes precedence over wildcard.

 

Googlebot vs Googlebot-image

Google has few different bots/crawlers. One for standard pages, and then others for mobile pages, AdSense (this one crawls pages with AdSense code and defines the context of the page (and other actions related to ad publishing)), images, etc. See full list of Google bots.

 

This gives us flexibility with creating instructions for crawlers in robots.txt. It’s not uncommon to have Magento robots.txt that, among others, has instruction:

Disallow: /media/  

This particular instruction blocks all crawlers from accessing everything that’s under /media/ subfolder – including images!

 

While it’s not needed to crawl content pages under /media/ (and we advise you not to let Googlebot crawl it – it may bring errors in your GWT report), it’s useful to let Googlebot-Image crawl your /media/ subfolder (and all other  folders with images related to your site). And this is how to do it (using /media/ folder as reference, but can be applied to any folder):

 

If you want your whole site to be indexed by Googlebot-Image, and prevent /media/ folder for all other crawlers – put this in robots.txt:

 

# Google Image Crawler Setup

User-agent: Googlebot-Image

Disallow: 

# all other crawlers 

User-agent: *

Disallow: /media/

+[other instructions]

 

If you test this instructions in Google Webmasters Tools, you’ll get this:

3
Top

Care to rate this post?

Author

Drazen Karacic-Soljic

E-commerce Consultant

Cacan is an E-Commerce Consultant at Inchoo, where he's monitoring key performance indicators of client's online store and suggesting improvements.

Other posts from this author

Discussion 3 Comments

Add Comment
  1. manish

    Hi Sir,

    i want to prevent from index particular pages then

    Disallow: /xyzon.com/btw-formulier365

    Above instruction will work in robots.txt or not

    it’s Magento Site, if i go to apply from back end then its apply for all pages…

    please help me on this.

  2. if i prevent Googlebot-Image to index images, will it prevent only images to index or all other links as well?

  3. What about Bing Bot? You are blocking bing from the media folder I noticed I am getting a product data feed error in bing merchant because bing bot is blocked from the media folder. How would you handle that situation?

Add Your Comment

Please wrap all source codes with [code][/code] tags.
Top