I’ve seen a lot of Magento websites get a message within Google Webmaster Tools saying “Googlebot found an extremely high number of URLs on your site: example.com”.
What does this message mean?
This message usually means that your layered navigation (be it Magento’s default layered navigation or miss-configured Magento SEO layered navigation extension) got indexed in combination with other parameters and is now creating millions of thin content URLs.
What effect does it have on my website?
Directly, this means that Google is having trouble indexing all of the URLs on your website because there are too many. You’re using up bot’s “crawl budget” for your domain by displaying too many URL combinations and some of the important URLs might not get indexed.
Indirectly, if you’re allowing Google to index all of those thin content URLs and there are so many that Google can’t even crawl all of them, just imagine what you’re doing with your internal linking structure and how much you’re dispersing your internal link juice.
How can I fix this?
Unfortunately there is no universal solution for this issue that you can apply to any case. This issue means there are too many thin content URLs on your website and fixing it means that you need to identify what is causing this particular issue on your website.
Most commonly this issue is caused by layered navigation but it can also be caused by some Magento extensions.
Look at the example URLs that Google gave you in the message. You might identify a pattern of URLs that googlebot doesn’t “like”. You might figure out that all of the URLs have your price filter in layered navigation parameter or manufacturer parameter or some obscure combination of different parameters.
In any case, once you identify the URLs that are an issue, probably the best thing to do is to disallow their path or parameter through robots.txt. I’d also like to point you to our recommended robots.txt configuration for default store, but that default robots.txt will most likely not help you if you received this message in GWT.
A very good alternative to blocking those URLs through robots.txt is to use meta noindex on those URLs.
In case you have an SEO layered navigation extension installed, it’s possible that by configuring it better you can avoid having those problematic URLs indexed.
OMG am I getting penalized?
No. This is not a penalty from Google. Google is just warning you there is a technical SEO issue on your website that is affecting the amount of URLs Google can crawl on your website. It has nothing to do with your ranking positions directly, although it has a lot to do with your ranking positions indirectly since you’re leaking too much link juice into thin content pages.
While method of using robots.txt to block those unwanted URLs from being crawled is probably the most effective method in solving the issue of Google indexing too many URLs on your Magento store, for ranking purposes I’d recommend using the meta noindex method since that method will create better internal link juice flow. To understand the difference between meta noindex and robots.txt disallow, read this.
If you’d like some help with solving this issue, I’d recommend our Magento SEO audit service.