There are many Magento robots.txt templates floating around, with tens of instructions for crawlers. Most of them deal with preventing crawlers to access some of many Magento’s directories. But sometimes these instructions can be too restrictive and might prevent Google from indexing your images.
If that’s your intention – well, you can stop reading this article right now. But, before doing so try to consider Google Image search as supplementary engine to Google search. Maybe that’s the train you wouldn’t want to miss, right?! 🙂
First of all, let’s start with robots.txt basics. Two basic instructions are:
First one tells us that instructions that follow are in relation only to that particular User-agent. “User-agent: *” tells the robot that the rules listed below apply to all robots, but if you have “User-agent: googlebot” – Google’s crawler with take into account only instructions under User-agent: googlebot. You may have one or more User-agent definitions (groups of instructions), but keep in mind that they don’t add up (each group is on it’s own), and more importantly – more specific User-agent definition takes precedence over wildcard.
Googlebot vs Googlebot-image
Google has few different bots/crawlers. One for standard pages, and then others for mobile pages, AdSense (this one crawls pages with AdSense code and defines the context of the page (and other actions related to ad publishing)), images, etc. See full list of Google bots.
This gives us flexibility with creating instructions for crawlers in robots.txt. It’s not uncommon to have Magento robots.txt that, among others, has instruction:
This particular instruction blocks all crawlers from accessing everything that’s under /media/ subfolder – including images!
While it’s not needed to crawl content pages under /media/ (and we advise you not to let Googlebot crawl it – it may bring errors in your GWT report), it’s useful to let Googlebot-Image crawl your /media/ subfolder (and all other folders with images related to your site). And this is how to do it (using /media/ folder as reference, but can be applied to any folder):
If you want your whole site to be indexed by Googlebot-Image, and prevent /media/ folder for all other crawlers – put this in robots.txt:
# Google Image Crawler Setup
# all other crawlers
If you test this instructions in Google Webmasters Tools, you’ll get this: