The possibility of statistical mistake in split testing

Featured Image

I’ve been reading an article at GetElastic blog, one of my favorite resources regarding e-commerce marketing. At the end of the article “A/B Test Case Study: Can Split Test Results Be Trusted?” they showed us a case study in which they tested two exactly the same variations against each other and one of them performed 4.97% better than the other.

The possibility of statistical mistake in optimization tests is a pretty big issue. I’ve seen case studies all over the internet with ridiculous conclusions and ridiculously big improvements all because the performer of the test made a mistake (sometimes I believe they even do it intentionally to show better results to their clients).

There are several things that can go wrong with these tests and here are the most common mistakes:

Low test sample

During the latest presidential elections here in Croatia, some smart people calculated that possibility of statistical mistake with a test sample of 10 000 randomly selected people is ~3%.

They weren’t wrong in all these years that these tests were performed; these tests were always accurate with exactly ~3% deviation.

Please note that this is a test sample of 10 000 people that performed the desired action. If we transfer this knowledge to the world of e-commerce, it would mean that our sample should be 10 000 transactions, not 10 000 visitors.

There isn’t a lot of stores in the world that can get 10 000 transactions in a reasonable time period, so I’m led to believe that most of the conversion rate optimization split tests performed out there are not really accurate.

This calculation (10 000 transactions = ~3% statistical mistake) is true for the election body of 4 402 045 people. In order to get the right calculation for your specific case, you need to calculate the amount of people out there that fit the criteria of your targeted audience. This means that some B2C e-commerce store with wide market of lets say 45 000 000 potential customers would need 100 000 transactions to achieve ~3% possibility of statistical mistake.

Low time period

Let’s say you have a store and you could actually get a relevant test sample of 10 000 transactions within days. You need to extend this test to a longer time period then a few days.

What could happen is, you tested variations during the working days and only a certain population of people with certain behavior comes to your site Monday to Friday. Your test didn’t really capture the behavior of people visiting your store during the weekends and this population might have a completely different behavior compared to your test sample.

Unrepresentative test sample

Choose your methods wisely. I’ve actually explained this in the article before. You could increase the conversion rate of a store by actually decreasing the revenue. It’s highly recommended to read this article to understand how increasing the conversion rate (the percentage) is not the actual goal of conversion rate optimization (I know, it sounds crazy, but just read it).


Care to rate this post?


Toni Anicic

eCommerce Consultant

SEO. Professional gaming. Home-brewed beer. Magento Certified Solution Specialist.

Other posts from this author

Discussion 1 Comment

Add Comment

Add Your Comment

Please wrap all source codes with [code][/code] tags.
  • More Traffic? More Sales? We can help!

    More Traffic? More Sales? We can help!

    We analyze your store and research the best marketing approach to increase traffic and achieve your goals.

    Learn more Get a quote
  • Hire Magento Developers

    Magento Certified Developers

    Teams of certified Magento developers are available to work on your next big thing!

    Hire Magento Developers Get a quote
  • Our Latest Project

    Viv & Lou

    Viv & Lou

    When an ordinary gift just won’t do.

    See our work
  • The Inchooers

    Meet The Inchooers, the funny bunch of magento developers.