Statistical significance is the likelihood that the difference in conversion rates between a given variation and the baseline is not due to random chance.
A result of an experiment is said to have statistical significance, or be statistically significant, if it is likely not caused by chance for a given statistical significance level.
Your statistical significance level reflects your risk tolerance and confidence level. For example, if you run an A/B testing experiment with a significance level of 95%, this means that if you determine a winner, you can be 95% confident that the observed results are real and not an error caused by randomness. It also means that there is a 5% chance that you could be wrong.
Statistical significance is a way of mathematically proving that a certain statistic is reliable. When you make decisions based on the results of experiments that you’re running, you will want to make sure that a relationship actually exists.
Online web owners, marketers, and advertisers have recently become interested in making sure their a/b test experiments (eg. conversion rate a/b testing, ad copy changes, email subject line tweaks) get statistical significance before jumping to conclusions.
Statistical significance is most practically used in statistical hypothesis testing. For example, you want to know whether or not changing the color of a button on your website from red to green will result in more people clicking on it.
If your button is currently red, that’s called your “null hypothesis”. Turning your button green is known as your “alternative hypothesis”. To determine the observed difference in a statistical significance test, you will want to pay attention to two outputs: p-value and confidence interval around effect size.
P-value refers to the probability value of observing an effect from a sample. A p-value of < 0.05 is the conventional threshold for declaring statistical significance.
Confidence interval around effect size refers to the upper and lower bounds of what can happen with your experiment.
Statistical significance is important because it gives you confidence that the changes you make to your website or app actually have a positive impact on your conversion rate and other metrics. Your metrics and numbers can fluctuate wildly from day to day, and statistical analysis provides a sound mathematical foundation for making business decisions and eliminating false positives.
A statistically significant result isn’t attributed to chance and depends on two key variables: sample size and effect size.
Sample size refers to how large the sample for your experiment is. The larger your sample size, the more confident you can be in the result of the experiment (assuming that it is a randomized sample). If you are running tests on a website, the more traffic your site receives, the sooner you will have a large enough data set to determine if there are statistically significant results. You will run into sampling errors if your sample size is too low.
Effect size refers to the size of the difference in results between the two sample sets and indicates practical significance. If there is a small effect size (say a 0.1% increase in conversion rate) you will need a very large sample size to determine whether that difference is significant or just due to chance. However, if you observe a very large effect on your numbers, you will be able to validate it with a smaller sample size to a higher degree of confidence.
Beyond these two factors, a key thing to keep in mind is the importance of randomized sampling. If traffic to a website is split evenly between two pages but the sampling isn’t random, it can introduce errors due differences in behavior of the sampled population.
For example, if 100 people visit a website and all the men are shown one version of a page and all the women are shown a different version, then a comparison between the two is not possible, even if the traffic is split 50-50, because the difference in demographics could introduce variations in the data. A truly random sample is needed to determine that the result of the experiment is statistically significant.
In the pharmaceutical industry, researchers use statistical test results from clinical trials to evaluate new drugs. Research findings from significance testing indicates drug effectiveness which can drive investor funding and make or break a product.
Calculating statistical significance accurately can be a complicated task that requires a solid understanding of statistics and calculus.
Fortunately, you can easily determine the statistical significance of experiments, without any math, using Stats Engine, the advanced statistical model built-in to Optimizely.
Stats Engine operates by combining sequential testing and false discovery rate control signs to deliver statistically significant results regardless of sample size. Updating in real time, Stats Engine will ensure a 95% significance level results every time, boosting your confidence in making the right decision for your company and to avoid pitfalls along the way.
To address these common problems, Stats Engine was created to test more in less time. By helping you make statistically sound decisions in real time, Stats Engine adjusts values as needed and shares trustworthy results quickly and accurately.
Start running your tests with Optimizely today and be confident in your decisions.
Be inspired by 40+ experiment ideas that have generated millions in revenue.
Learn the benefits of experimenting at scale from this original research report from the Harvard Business Review
This assessment is the starting point to understanding your organization’s capabilities and will set you on the path to building a high-performing program.
An error has occurred
You can get the very best of Optimizely without spending a dime.Try it out for 30 days, on us.
Hang tight! We're creating your account and password instructions are headed to your inbox.
Please correct form errors
Get a free account with full access to Optimizely's APIs and SDKs.
Already have an Optimizely account? Sign in here.
Start releasing products smarter with feature flags and rollouts. Prove value with A/B testing. Built on our Full Stack platform.
Welcome, we're creating your account...