Editing robots.txt in Magento 2 Admin

Editing robots.txt in Magento 2 Admin

In the last few versions of Magento 2 a few shiny new features regarding robots.txt appeared and they bring some interesting issues with them. In this blog post, I’ll attempt to walk you through all of the different cases you can encounter while modifying the Magento 2 robots.txt file.

Episode I: The Status Code Menace

First thing you can notice (if you’re a freak like me that actually checks the status code of most URLs you visit) is that right now, by default if you install a clean Magento 2 and you don’t add any robots.txt file yourself, if you or any search engine bot tries to access the robots.txt file, instead of getting a 404, you’ll actually get a status 200 page:

Status 200 on Magento 2 robots.txt file

Notice it also weights some bytes. This robots.txt file doesn’t actually physically exist in the root of your store. It’s actually a dynamically generated file which is generated because some of the new configuration options in Magento 2 admin that I’ll got through with you now.

Why is there a blank dynamically generated status 200 page instead of a 404? Because Magento 2. Can you somehow disable it from admin and get a 404 page instead? No.

Episode II: The Sitemap Injection

If you navigate to “Store > Configuration > Catalog > XML Sitemap” and scroll down to “Search Engine Submission Settings”, you’ll find a setting thet enables you to add the sitemap: directive to your robots.txt file:

Setting in Magento 2 Admin That adds Sitemap to Robots.txt File

If you enable this, the URL towards your main sitemap index file (the one that contains just two URL-s, the URLs of your actual URL sitemap and image sitemap) will be added to the robots.txt file that’s dynamically generated on your website and if you visit yourstore.com/robots.txt you’ll see something like this:

Sitemap: http://m221.jednorog/sitemap.xml

Episode III: The Content Design Configuration

What does robots.txt have to do with design one might ask? Nobody knows but still, let’s navigate to “Content > Design > Configuration”.

In this very logical place, lets edit a global or website scope:

Edit global or website scope

In here you can open an accordion section titled “Search Engine Robots”.

First thing you’ll notice is that these are blank. But why if your file already includes the sitemap directive we enabled in the previous section of this blog post? Because Magento 2.

Lets add a few lines of code like in the screenshot bellow to the robots.txt file and see what happens, shall we?

Editing robots.txt for a website scope

PRO TIP: If you’re having trouble saving the configuration at this step locally, try this fix, it worked for me. Why? Because Magento 2.

What do we get now on the front-end of the dynamically generated robots.txt file once both sitemap and a custom robots.txt text have been enabled you might ask? How will Magento 2 combine and merge this text and where will the sitemap directive appear? Well… this is what we get:

Robots Txt Messed Up by Magento 2 inline

A miss-formatted robots.txt file that doesn’t go into a new line where it should with a sitemap directive appended at the end of it.

One would expect that clicking a reset to default button would return a blank field as that was the default state we found this in once Magento was installed, right? Wrong. What we get is this:

Magento 2's default boilerplate for robots.txt file

A badly written boilerplate for Magento 2 robots.txt file that I wouldn’t recommend using as it disallows everything with a parameter on the store.

Episode IV: A New Hope

What happens if we now add a custom robots.txt file that is actually physically present in the root of your store?

It completely overwrites everything we did in the previous steps. It disregards all the text you input in episode III as well as sitemap injection of episode II. And if you wrote it correctly, it works and is formatted correctly.

So to conclude…

At least for now, stick with adding the robots.txt to your Magento 2 stores the old fashioned way – add an actual text file to the root of your store.

Related Inchoo Services

You made it all the way down here so you must have enjoyed this post! You may also like:

Pagination with rel=”next” and rel=”prev” in Magento 2 Sasa Brankovic
Sasa Brankovic, | 7

Pagination with rel=”next” and rel=”prev” in Magento 2

Build your 2017 Magento SEO strategy with these tips Ivona Namjesnik
Ivona Namjesnik, | 10

Build your 2017 Magento SEO strategy with these tips

Common Magento 2 SEO mistakes Toni Anicic
Toni Anicic, | 6

Common Magento 2 SEO mistakes

2 comments

  1. Hello, I do not know if this is a good place to ask, but I try.
    I’ve added my robot.txt to my content, design, configuration.
    User-agent:*
    Disallow: /lib/
    Disallow: /*.php$
    Disallow: /pkginfo/
    Disallow: /report/
    Disallow: /var/
    Disallow: /catalog/
    Disallow: /customer/
    Disallow: /sendfriend/
    Disallow: /review/
    Disallow: /*SID=
    Disallow: /*?

    # Disable checkout & customer account
    Disallow: /checkout/
    Disallow: /onestepcheckout/
    Disallow: /customer/
    Disallow: /customer/account/
    Disallow: /customer/account/login/

    # Disable Search pages
    Disallow: /catalogsearch/
    Disallow: /catalog/product_compare/
    Disallow: /catalog/category/view/
    Disallow: /catalog/product/view/

    # Disable common folders
    Disallow: /app/
    Disallow: /bin/
    Disallow: /dev/
    Disallow: /lib/
    Disallow: /phpserver/
    Disallow: /pub/

    # Disable Tag & Review (Avoid duplicate content)

    Disallow: /tag/
    Disallow: /review/

    # Common files
    Disallow: /composer.json
    Disallow: /composer.lock
    Disallow: /CONTRIBUTING.md
    Disallow: /CONTRIBUTOR_LICENSE_AGREEMENT.html
    Disallow: /COPYING.txt
    Disallow: /Gruntfile.js
    Disallow: /LICENSE.txt
    Disallow: /LICENSE_AFL.txt
    Disallow: /nginx.conf.sample
    Disallow: /package.json
    Disallow: /php.ini.sample
    Disallow: /RELEASE_NOTES.txt

    # Disable sorting (Avoid duplicate content)
    Disallow: /*?*product_list_mode=
    Disallow: /*?*product_list_order=
    Disallow: /*?*product_list_limit=
    Disallow: /*?*product_list_dir=

    # Disable version control folders and others
    Disallow: /*.git
    Disallow: /*.CVS
    Disallow: /*.Zip$
    Disallow: /*.Svn$
    Disallow: /*.Idea$
    Disallow: /*.Sql$
    Disallow: /*.Tgz$

    Sitemap: https://www.supermarket.no/sitemap.xml
    but, lately, I’ve got google error:
    Submitted URL blocked by robots.txt
    Do you think I have to remove these disallowes?

    # Disable Search pages
    Disallow: /catalogsearch/
    Disallow: /catalog/product_compare/
    Disallow: /catalog/category/view/
    Disallow: /catalog/product/view/
    Disallow: /catalog/

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <blockquote cite=""> <code> <del datetime=""> <em> <s> <strike> <strong>. You may use following syntax for source code: <pre><code>$current = "Inchoo";</code></pre>.

Tell us about your project

Drop us a line. We'd love to know more about your project.