Google Image indexing and Magento [robots.txt]

Google Image indexing and Magento [robots.txt]

There are many Magento robots.txt templates floating around, with tens of instructions for crawlers. Most of them deal with preventing crawlers to access some of many Magento’s directories. But sometimes these instructions can be too restrictive and might prevent Google from indexing your images.


If that’s your intention – well, you can stop reading this article right now.  But, before doing so try to  consider Google Image search as supplementary engine to Google search. Maybe that’s the train you wouldn’t want to miss, right?! 🙂

 

First of all, let’s start with robots.txt basics. Two basic instructions are:

User-agent: 

Disallow:

 

First one tells us that  instructions that follow are in relation only to that particular User-agent.  “User-agent: *” tells the robot that the rules listed below apply to all robots, but if you have “User-agent: googlebot” – Google’s crawler with take into account only instructions under User-agent: googlebot. You may have one or more User-agent definitions (groups of instructions), but keep in mind that they don’t add up (each group is on it’s own), and more importantly – more specific User-agent definition takes precedence over wildcard.

 

Googlebot vs Googlebot-image

Google has few different bots/crawlers. One for standard pages, and then others for mobile pages, AdSense (this one crawls pages with AdSense code and defines the context of the page (and other actions related to ad publishing)), images, etc. See full list of Google bots.

 

This gives us flexibility with creating instructions for crawlers in robots.txt. It’s not uncommon to have Magento robots.txt that, among others, has instruction:

Disallow: /media/  

This particular instruction blocks all crawlers from accessing everything that’s under /media/ subfolder – including images!

 

While it’s not needed to crawl content pages under /media/ (and we advise you not to let Googlebot crawl it – it may bring errors in your GWT report), it’s useful to let Googlebot-Image crawl your /media/ subfolder (and all other  folders with images related to your site). And this is how to do it (using /media/ folder as reference, but can be applied to any folder):

 

If you want your whole site to be indexed by Googlebot-Image, and prevent /media/ folder for all other crawlers – put this in robots.txt:

 

# Google Image Crawler Setup

User-agent: Googlebot-Image

Disallow: 

# all other crawlers 

User-agent: *

Disallow: /media/

+[other instructions]

 

If you test this instructions in Google Webmasters Tools, you’ll get this:

If you’re feeling confused by all of this (and many other) things regarding Google and its robots or simply want to improve your rankings, make sure to get our custom Magento SEO audit, and we’ll see what we can do!

You made it all the way down here so you must have enjoyed this post! You may also like:

Command to reindex required indexers Ivan Veres
Ivan Veres, | 0

Command to reindex required indexers

Editing robots.txt in Magento 2 Admin Toni Anicic
Toni Anicic, | 2

Editing robots.txt in Magento 2 Admin

Magento & GWT “Googlebot found an extremely high number of URLs on your site” Toni Anicic
Toni Anicic, | 2

Magento & GWT “Googlebot found an extremely high number of URLs on your site”

3 comments

  1. What about Bing Bot? You are blocking bing from the media folder I noticed I am getting a product data feed error in bing merchant because bing bot is blocked from the media folder. How would you handle that situation?

  2. Hi Sir,

    i want to prevent from index particular pages then

    Disallow: /xyzon.com/btw-formulier365

    Above instruction will work in robots.txt or not

    it’s Magento Site, if i go to apply from back end then its apply for all pages…

    please help me on this.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <blockquote cite=""> <code> <del datetime=""> <em> <s> <strike> <strong>. You may use following syntax for source code: <pre><code>$current = "Inchoo";</code></pre>.

Tell us about your project

Drop us a line. We'd love to know more about your project.