Magento SEO: How to handle problems caused by layered navigation?

Related Inchoo Services

Layered navigation, a feature available in Magento without any extensions is commonly used by many merchants around the world. It is also one of the most painful Magento features for the SEOs. It creates lots (depending on the amount of filters and products – often tens of thousands) of terrible URLs with duplicate / near duplicate content and identical page titles and descriptions.

I made this video to show you what are your options and what is in my experience the best possible solution for handling Magento layered navigation indexation issues. I hope it helps:

After you watched the video, you know what to do. How do you know if you did it correctly? Log-in to your Google Webmaster Tools, click on Health -> Fetch as Googlebot and see if layered navigation shows up.

I hope I helped. Anyone have a different experience and advice on this issue?

If you need any help, we can do a Magento SEO Audit for your site.

You made it all the way down here so you must have enjoyed this post! You may also like:

Build your 2017 Magento SEO strategy with these tips Ivona Namjesnik
Ivona Namjesnik, | 9

Build your 2017 Magento SEO strategy with these tips

Free eCommerce SEO consultations by Inchoo at Meet Magento Poland Tea Pisac Benes
, | 0

Free eCommerce SEO consultations by Inchoo at Meet Magento Poland

In SERP title tags no longer control what’s shown Toni Anicic
, | 10

In SERP title tags no longer control what’s shown

60 comments

  1. Tell me please someone… is the following a free solution for this? i have no budget for a fancy Extention…

    1. Go to:

    PathToThemeTemplateFiles/priceslider/slider_layered_nav.phtml

    Add to line 1:

    <?php if (strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot") || strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "bingbot") || strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "slurp") || strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "msn")): ?>
    
    <?php else: ?>

    2. then add to the last line:

    endif; ?>
    1. You could add “noindex, follow” to only pages with a attributes by overriding app/code/core/Mage/Catalog/Block/Category/view.php into your local directory and around line 42 look for this:

      if ($headBlock = $this->getLayout()->getBlock('head')) {

      Directly under that line add the following code:

      //edit: noindex, follow all category pages with parameters
      if(count($_GET)) {
      	$headBlock->setRobots("NOINDEX,FOLLOW");
      }

      This would ONLY noindex pages with parameters, but your main category pages would still get indexed. In your robots.txt adding individual parameters would block reading these pages:

      User-agent: *
      Disallow: /*?limit=*
      Disallow: /*?dir=*
      Disallow: /*?order=*
      Disallow: /*?mode=*
      Disallow: /*?other_params=*
  2. What about using noindex for the pages that have some parameter, and nofollow to all the links of the Layered navigation?

  3. about the canonical: you can use it, if you point every categorie-site to his one “canonicalsite”. catergorieabc?length=124p=2 (p02 for second categorie page) points to rel=canonical=catergorieabc?p=2

  4. sorry, the code got deleted:
    >a href=”http://www.shiptonandheneage.co.uk/mens-shoes.html” onclick=”GomageNavigation.click(this); return false;” rel=”nofollow” data-url=”http://www.shiptonandheneage.co.uk/mens-shoes.html” data-param=”?cat=22&dir=asc&limit=28&mode=list&order=price” data-ajax=”1″>Boots>/a<

  5. Hi,

    We have GoMage ‘Advanced Navigation’ which uses AJAX to filter layered navigation but I just noticed the following when looking at the page code:

    Boots

    Does it mean they just use rel=”nofollow” to stop bots from indexing which is a ‘no go’ as you all say here or I am missing something.
    Please guys, could anyone help.

  6. I got an idea to wrap the layered nav in such code, maybe this will help someone?

    <div id="filters-no-follow"></div>
    
    <?php
    function prepare_for_echo($string) {
    $no_br = trim(preg_replace('/\s+/', ' ', $string));
    $no_slashes = str_replace('\'', '\\\'', $no_br);
    return $no_slashes;
    }
    ?>
    
    <script>
    function please_enable_cookies() {
    var f = document.getElementById('filters-no-follow');
    f.innerHTML = '<div class="no-cookies-error">Enable cookies to choose filters.</div>';
    }
    
    function please_load_filters() {
    var f = document.getElementById('filters-no-follow');
    f.innerHTML = '<?php if ( !empty($filtersHtml) || !empty($stateHtml) ): ?>'
    + '\n<div class="block block-layered-nav">'
    + '\n    <div class="block-title">'
    + '\n        <strong><span><?php echo prepare_for_echo($this->__('Shop By')); ?></span></strong>'
    + '\n    </div>'
    + '\n    <div class="block-content">'
    + '\n        <?php echo prepare_for_echo($this->getStateHtml()); ?>'
    + '\n        <?php if ($this->canShowOptions()): ?>'
    + '\n        <p class="block-subtitle"><?php echo prepare_for_echo($this->__('Shopping Options')); ?></p>'
    + '\n        <dl id="narrow-by-list">'
    + '\n            <?php echo prepare_for_echo($filtersHtml); ?>'
    + '\n        </dl>'
    + '\n        <?php endif; ?>'
    + '\n    </div>'
    + '\n</div>'
    + '\n<?php endif; ?>';
    }
    
    function are_cookies_enabled()
    {
        var cookieEnabled = (navigator.cookieEnabled) ? true : false;
    
        if (typeof navigator.cookieEnabled == "undefined" && !cookieEnabled)
        { 
            document.cookie="testcookie";
            cookieEnabled = (document.cookie.indexOf("testcookie") != -1) ? true : false;
        }
        return (cookieEnabled);
    }
    
    if(are_cookies_enabled()) {
    please_load_filters();
    } else {
    please_enable_cookies();
    }
    </script>
  7. Hi, very nice article. But maybe is there someone with practical approach done on the matter of best solution mentioned in the video?

  8. Hi Toni,

    Thanks for this nice video. You point out the duplicate content issue from a “layered navigation” point of view. For my site I have set URL parameters for these layered navigation filters (color, brand, etc.) to “No URLs” in GWT. Google tends to follow this instruction pretty good.

    My problem is more related to the sorter options on the category pages (mode, limit, order and direction). I have also set URL Parameters for these filters in GWT, but Google does not obey them… 🙁

    You don’t mention these filters in your video. What would be a good solution to prevent google from following these sorter-links and indexing these filter pages?

    Thanks!

    Matthijs

  9. Michael- I also was trying the GoMage Advance Nav extension. It does not use the Agent Detection method as described here. According to GoMage, that method will be released in a future revision. Instead they currently use a “rel=nofollow” method and a robot.txt method to exclude the ajax code from indexing. They charged me $50 for a modification fee. I think Sinfill Thrill is on the right path.

  10. I added the GoMage Advanced Nav (ver3.2) for Magento and then checked in the ‘Fetch as Google’ code to see if Ajax was hiding the sort-by filters… Im not a programmer so I am not sure it has worked?

    It looks as though it is written different and I can clearly see it is using the plugin by the term ‘ajax’, but shouldn’t this be invisible now?

    Sort By

    Position

    Name

    I can see the ‘loading please wait’ appearing when you select from drop down filters, pagination links etc…. So again I know the ajax plugin is installed and in use. I just thought option values above wouldn’t be there anymore? or is it a case of Google can no longer follow them?

    Sorry im new with this and thanks.

    Michael

  11. Toni,
    Thanks for the interesting article and for explaining it. Unfortunately, you don’t provide a clear step by step guide to solve the problem.
    How do we create an Ajax box?
    Can we jsut turn layered navigation off? How is that done – just by using the anchor option in the categories?

    Apart from that you mentioned in the comments that you only a few extension providers. Who are these and why do you trust them.
    Apart from that, what do you think of Nitrogento and Lightspeed extensions?

  12. This sounds like very sensible advice and I achieved it by putting a check in /templates/catalog/layer/filter.phtml.

    This is the function I’m using to check and it seems to work just fine, having looked at a few pages as Googlebot.

    	function IsGooglebot(){
    	// check if user agent contains googlebt
    	if(eregi("Googlebot",$_SERVER['HTTP_USER_AGENT'])){
    	$ip = $_SERVER['REMOTE_ADDR'];
    	//server name e.g. crawl-66-249-66-1.googlebot.com
    	$name = gethostbyaddr($ip);
    	//check if name ciontains googlebot
    	if(eregi("Googlebot",$name)){
    	//list of IP's
    	$hosts = gethostbynamel($name);
    	foreach($hosts as $host){
    	if ($host == $ip){
    	return true;
    	}
    	}
    	return false; // Pretender, take some action if needed
    	}else{
    	return false; // Pretender, take some action if needed
    	}
    	}else{
    	// Not googlebot, take some action if needed
    	}
    	return false;
    	}
  13. Depends on how much link juice you have to work with in the first place. Layered navigation could create literally millions of URLs, depending on the amount of attributes and products. You might not wanna disperse all of your link juice on a really big amount of URLs unless you’re going for a very very long tail. With such a set-up you can totally forget about highly competitive broad match short keywords.

  14. Hi Raoul,

    Regarding robots, yea, depends how you define “indexing” but if you think about it, reading something as a robot is indexing, it just depends in which index you store it 🙂

    Filter pages in Magento’s layered navigation are really not optimized for SEO, you have no control over title, meta data, URLs are not SEO friendly… so using them as landing pages would require heavy modifications.

  15. By the way the difference between robots.txt and noindex is a bit different then you described above:

    “Noindex is a funny thing, it actually doesn’t mean “You can’t index this”, it means “You can’t show this in search results”. Robots.txt disallow means “You can’t index this” but it doesn’t mean “You can’t show it in the search results”.”

    Noindex => You can read this but you can not index it

    Robots.txt => You can’t read this but you can index it

  16. Hi Toni,

    thanks for the video, I am with you that what you are decscibing (good old page rank sculpting basically) might be the best solution to control link juice and distribute it amongst categories.

    But I am also with Ed and Michael that the Canonical* (even though its not 100% for what it is indended for) might lead to simalar effects.

    *I am not sure though (anyboy know an the answer) how much (if any) link juice is passed through outgoing links on filter pages that have a Canonical back to the orignial unfiltered page.

    Just applying the noindex would push more linkjuice down to the products, as Peer suggested. The downside: it would almost distribute the linkjuice amongs all products mostly equally, if there are a lot of filters available. And it would weaken the category sites a lot!

    So I would agree with Toni that the best solution probably is PR Sculpting – Ideally in combination with smart sorting so that your top products get displayed on top and get most link juice. If you have many products in a category you would have to deal with pagination (here noindex might make sense to help get all your products indexed).

    Toni: What do you think of using filter pages as landingpages instead, making the Ajax crawlabe? I assume that all filter combinations have enough search volume, so this would make sense…

  17. @Toni,

    Ahh very important distinction, I was trying to work out the difference between noindex and robots.

    Makes sense, essentially hiding the links will stop link juice passing, and the old stuff in the index not really a issue since the main pages should be ranked higher in any case because of the change.

    Is there a GWT bulk remove uploaded btw?

  18. Hi Ming,

    Noindex is a funny thing, it actually doesn’t mean “You can’t index this”, it means “You can’t show this in search results”. Robots.txt disallow means “You can’t index this” but it doesn’t mean “You can’t show it in the search results”.

    So, noindex shouldn’t really “de-index” already indexed pages. But, since we hid the layered navigation, there are no more internal links to it so no link juice is passed to them anymore.

    The only remaining problem are those already indexed links since they can show up in the search results, but when you think about it, it’s not really an issue anymore, they can remain in index.

  19. @Toni,

    Thanks for the reply, from what I understand in your video:
    1) nofollow is terrible (this I understand, pretty common knowledge).
    2) noindex (wouldn’t this be good to use to remove urls from index rather than GWT?) – I understand that while you can remove the urls from index, the link juice will still flow through, but of course, its a bit wasted?
    3) robots.txt (I never see the point to use this, easier to maintain it via a meta).
    4) GWT paramaters (agreed, we use this and it does pretty much nothing, stuff gets indexed and it doesn’t help with duplicate content).

    I like your solution of hiding the layered navigation from google to prevent link crawling…

    But wouldn’t a more comprehensive solution be:
    1) Hide the layered navigation (i.e. your solution)
    2) Use noindex on layered navigation pages to remove already indexed pages from google
    3) Use rel canonical on layered navigation pages to tell google which page is the original (or do you think using this with noindex is pointless? noindex + canonical, or one of them?)
    4) Use GWT parameters anyways?

    Would this be worse than your solution? From what I can work out, essentially avoid nofollow and robots and have the hidden layered navigation and this is the gist of it?

    Thanks Toni!

  20. @Ming, yea… they will. Since there will be no more links towards them, at least no internal, you can remove URLs from index through Google Webmaster Tool, although that will take a lot of time, but I don’t see a much better solution.

  21. @Toni,

    Interesting article / video. How would you handle layered navigation pages already indexed?

    By implementing a hidden layered navigation, past pages would still be in the index?

  22. So link juice still is flowed to pages on a site that have Meta – No Index on them or are blocked out by robots.txt?

    I thought the link juice would be blocked out??

    Are you sure about this?

  23. Hi Toni,

    Thanks for your reply. Yes, there are a few extensions that do similar things. I’m trying to decide between one that enables SEO friendly URLs for attributes (rather than ?manufacturer=2&color=1) or the GoMage advanced navigation that lets the “?manufacturer=2&color=1” part be hidden completely. Ideally trying to eliminate the duplicate content from the layered nav. Any thoughts?

  24. Hi Ben,

    I saw there’s a SEO Layered Navigation extension, however, I never tested it, since our usual experience with Magento extensions is that they are not coded very well except for a very few extension providers we trust.

  25. Hi Toni,

    The SEO Layered Navigation plugin claims to hide layered navigation html from the page source in case you need to prevent indexing by search engine spiders.

    Layered navigation html is encoded with php and then decoded with javascript so search engine bots don’t see any links, links are only available for users with enabled javascript.

    What is your opinion on this solution?

  26. In my opinion the combination of noindex, follow and gwt parameter is the best solution. 1.) If you love your shop your domain … onpage optimization should be white and not black or grey. 2.) With noindex, follow you direct the power to the products and you get more links to each product. Pagerank is a relict from better times 😉

  27. Thanks for your reply, Toni!

    I will spend some time later trying to find a solution and I will post it here if/when I manage to get the job done. 🙂

  28. Patrick,

    I thought most people would like to know WHY we suggest this solution and not some of the other options and from the first comments I can see that was the right approach of presenting the solution. You should always know why are you doing something the way you’re doing it, not just blindly follow the advice you read somewhere on the internet. Especially when it comes to SEO. I’m sorry I wasted you 3 and a half minutes by explaining how and why we did it the way we did in a completely free article and advice that I filmed and written for you and your business to benefit and that you chose to read and watch for free. I’m such a terrible person.

    Emanuel,

    If/when developers find some time, they’ll make a guide on how to achieve this from a technical perspective. For now, you can do the same thing my developers did when I told them what needs to be done: Google how to detect if cookies are enabled.

  29. This really can’t help excluding many possibility with 3 and half minutes , then just go through a solution with 30 seconds in a hurry. Can anyone tell how exactly?

  30. Michael,

    You usually have both normal AND layered navigation. So normal navigation will stay and be indexed, root categories and subcategories will be indexed. In layer navigation those are parameters (attributes) which filter the data, not categories, you don’t want them indexed, it will be a mess of duplicate content, identical titles and descriptions.

    Canonical, as I explained earlier is only to point the same or very similar content to the original source. It’s not applicable to layered navigation filter pages. You can try it but Google will usually ignore it.

  31. I’m with Ed here on the canonical tag.

    It seems to me that layered navigation is to categories what css is to content. It feels very much like a way for the user to interact with the same content in a different way (without restructuring the same content each time) but we still want Google to index the category just once, regardless of the various view opportunities.

    Unless I’ve missed something I’ll later come to regret doesn’t canonical allow us to do exactly that?

  32. Thanks for the great explanation. I’ve been suffering from duplicate content (at least in part due to layered nav) and it’s great to know that you guys have found a working solution. One suggestion though – maybe get a wireless microphone for the next video? Sound quality was pretty poor.

  33. Very true but it certainly is one approach to look at that seems to be working for them. Perhaps it would make for a great module 🙂

    I would argue that rel=canonical would be appropriate here, you have the “same” content spread over various pages which all in effect relate to your main category. Without a better solution, at least this method would remove the duplication issue and not lose you that value juice.

  34. Ed,

    What works for Amazon is usually not the best solution for a mid-size online merchant. Amazon has amazing amount of link juice to distribute and they can afford having a bit different navigation then the rest of us and still rank well.

    Regarding rel=canonical, short answer is no.

    Long answer is rel=canonical can point to the same content, differently organized, not different content or “thinner” content. Think of it as a tool to show that https version of the URL is actually a duplicate of http version. That some ?ref=abcd parameter for example that doesn’t really change the content of the URL is just a canonical version of the one without the parameter etc. But, as soon as the content changes, canonical is not applicable.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <blockquote cite=""> <code> <del datetime=""> <em> <s> <strike> <strong>. You may use following syntax for source code: <pre><code>$current = "Inchoo";</code></pre>.

Tell us about your project

Drop us a line. We'd love to know more about your project.