Hopefully it’s clear from previous posts that you can’t prove the null, and you can’t use power to build support for the null. And this confusion is one reason I don’t like the term “accepting” the null hypothesis. The question remains, however, of what you can do with a hypothesis that fits what you would normally consider a “null”, but that you would actually like to prove.

To flip the role you would normally attribute to a null hypothesis with that of an alternative hypothesis, you probably need to consider an equivalence test. First you have to nail down an effect size, that is, the maximum amount the parameter can deviate by (positive or negative) in the experiment in order to conclude that it is of no practical or scientific importance. Even if you’re not doing an equivalence test, this question is important in determining sample size because you want to be sure your results are both statistically and scientifically significant (but calculating sample size [PDF] is the subject for a future blog post).

**What’s the difference?**

In an equivalence test you take your null hypothesis to be non-equivalence. That is, that the absolute value of the parameter under consideration is greater than or equal to the effect size (the parameter is less than or equal to the negative of the effect size, or greater than or equal to the effect size). The alternative is, therefore, that the absolute value of the parameter is less than the effect size. Note that we don’t care if the parameter has a positive or negative effect—the goal is to reject the null hypothesis so that you can conclude that the effect is not of practical or scientific importance (although there are one-way equivalence tests as well).

For example, consider a treatment that is believed to be no better or worse than a placebo. The effect size should define the range of values within which the actual treatment effect can be considered to be of no scientific importance (equivalent to the placebo). The null—that there is a scientifically important difference between treatment and placebo—will be rejected if the treatment effect is found to be larger than the effect size. Remember that we don’t care if the treatment has a positive or negative effect compared to the placebo in this example, since our goal is to reject the null of no effect either way.

**Two for one**

An equivalence test is essentially two one-tailed tests—one test to determine that there is no scientifically important positive effect (it’s no better), and a second test to determine that there is no scientifically important negative effect (it’s no worse). And, as it turns out, the equivalence test is disjoint with a test of significance so that you can test both at the same significance level. Just to be clear, the test of significance would have null equal to zero (no treatment effect), and alternative greater than zero (some positive or negative treatment effect).

My focus in this and the last two posts was on hypothesis testing, even though confidence intervals are often preferred for making inferences. This is a reflection of the debate I was dragged into, not of personal preference. If you’re interested, Nick Barrowman shared a link (in the comments to a previous post) to a website that discusses equivalence testing and confidence intervals (although I don’t agree with their comments that equivalence from the perspective of statistical significance is convoluted). Regardless, the debate is over (at least for us).