Luk Arbuckle

You can’t prove the null by not rejecting it

In hypothesis testing on 25 October 2008 at 10:53 am

I was (willingly) dragged into a discussion about “proving the null hypothesis” that I have to include here. But it will end up being three posts since there are different issues to address (basic theory, power, and equivalence). First step is to discuss the theory of hypothesis testing, what it is and what it isn’t, as it’s fundamental to understanding the problem of providing evidence to support the null.

Hypothesis testing is confusing in part because the logical basis on which the concept rests is not usually described: it’s a proof by contradiction. For example, if you want to prove that a treatment has an effect, you start by assuming there are no treatment effects—this is the null hypothesis. You assume the null and use it to calculate a p-value (the probability of measuring a treatment effect at least as strong as what was observed, given that there are no treatment effects). A small p-value is a contradiction to the assumption that the null is true. “Proof”, here, is used loosely—it’s strong enough evidence to cast doubt on the null.

The p-value is based on the assumption that the null hypothesis is true. Trying to prove the null using a p-value is, therefore, trying to prove it’s true based on the assumption that it’s true. But we can’t prove the assumption that the null is true as we have already assumed it. The idea of a hypothesis test is to assume the null is true, then use that assumption to build a contradiction against it being true.

Absence of evidence
No conclusion can be drawn if you fail to build a contradiction. Another way to think of this is to remember that the p-value measures evidence against the null, not for it.  And therefore lack of evidence to reject the null does not imply sufficient evidence to support it.  Absence of evidence is not evidence of absence. Some would like to believe that the inability to reject the null suggests the null may be true (and they try to support this claim with high sample sizes, or high power, which I’ll address in a subsequent post).

Rejecting the null leaves you with a lot of alternatives.

Rejecting the null leaves you with a lot of alternatives. One down, an infinite number to go!

Failing to reject the null is a weak outcome, and that’s the point. It’s no better than failing to reject the innumerable models that were not tested. Although the null and alternative hypotheses represent a dichotomy (either one is true or the other), they underlie a parameter space. The alternative represents the complement of the space defined by the null, that is, the parameter space minus the null.

In the context of treatment effects, the null is no treatment effects, which represents a single point in the parameter space. But the alternative—some degree of treatment effects—is the complement, which is every point in the parameter space minus the null. If you want to use the theory of hypothesis testing in this way to “prove” the null, you would have to reject probability models for every point in the alternative, which is infinite! Even if you could justify taking a finite number of probability models, with some practical significance to each, it should be clear that it’s not just a matter of failing to reject the null.

I would like to follow up with a discussion of tests of equivalence, but first I need to attack the notion of increasing power to prove the null. As convincing as the above arguments may be, I was told that it’s just theory and that in practice you could get away with a lot less. As though we can ignore theory and reverse the notion of a hypothesis test without demonstrating equivalence. But they use the same faulty logic described above to justify it: if you can’t find a contradiction, then it must be correct.  Game on.

  1. In case anyone is concerned with how this applies to different types of hypotheses (point-null, one-sided, and interval), I recently found this paper, P-Values: What They Are and What They Are not [PDF], which states that “one-sided and point-null hypotheses are not two different objects that should never be compared, but rather they are just different versions of the same object of which interval hypotheses are objects as well.” Additionally, they give reason to reject the notion of using p-values as measures of support of their respective hypotheses.


    You cannot prove someone is guilty by saying, “I do not reject that this person is the guilty party.” This is not a valid argument. You are not being logical!

    You cannot say, “I cannot confirm or deny that this person is guilty; therefore this person is guilty.”

    If you cannot prove it, then it is plausible.

    By accepting that someone is either guilty or innocent, you are being logical which sets you up to be more methodical in your search for the answers. Notice I do not draw the conclusion, not guilty, innocent, or explicitly or implicitly rejecting either.

    “Failing to reject the null” is as illogical as telling your wife in an argument, “I fail to reject that you are correct.” Who would say that?

    [This comment was edited to remove duplicate sentences.]

  3. A hypothesis test is a probabilistic form of a proof by contradiction. We assume something is true, the null hypothesis, and try to disprove it. If we find proof against it, then our assumption was false (we reject the null); if we can’t find proof against it, then we’ve proven nothing (we have failed to reject the null).

    Rejecting the null is rejecting the hypothesis; failing to reject the null is is failing to disprove the hypothesis. It is not the same as accepting the null, which is what you seem to be implying. The terminology being used here is correct.

    To take your example, if the null hypothesis is that a person is innocent of a crime, then rejecting the null means rejecting their innocence. And since we assume they are either innocent or guilty, this implies that they are guilty. However, failing to reject the null does not mean that we accept their innocence. It just means that we can’t prove otherwise.

    It is certainly not the case that “if you cannot prove it, then it is plausible.” To turn things around, if a prosecutor cannot prove guilt, then it is not true to say that guilt is probable–we can only say that guilt is possible (since innocence has not been proven either).

  4. Thank you for your clarification, Mr. Arbuckle. I now understand the logic behind rejecting and failing to reject the null. I really appreciate your prompt response and desire to correctly inform.

  5. It’s not often that someone can demystify stats as perfectly as you have above. Thank you!

  6. Does this go back to the fact that you cannot prove a theory, you can only disprove it?

    For example, if we are trying to prove that all swans are white, we have to look every nook and corner int he world to make sure all swans are white. But we can easily disprove it if we find just one black swan!

  7. Yes and no. Your example is exactly right, except you can also evaluate the strength of the evidence and scientific plausibility. For example, just because there is no credible evidence that fairies exists, does not mean that fairies may exist. It’s perhaps not impossible, but it’s certainly not plausible either. There are ways in both frequentist and Bayesian statistics to evaluate the strength of evidence for something (through random sampling, large sample sizes, meta analyses, equivalence tests, etc.). In mathematics theorems are proven rigorously, but in the real world we need to evaluate evidence and scientific plausibility.

  8. Thanks!

    Isn’t this how the justice system works too: not guilty or guilty. The prosecutors try to disprove the not guilty hypothesis.

  9. Similar to English law, but there is also French or Napoleonic law in which you are guilty and need to prove your innocence. However, in science evidence keeps being accumulated and evaluated.

  10. I am wondering about when it is ever appropriate to accept some hypothesis. If I understand you correctly, failing to reject the null hypothesis does not give you warrant to accept the null hypothesis. Does rejecting the null hypothesis give you warrant to accept the alternative hypothesis?

    Specifically, I am wondering about arguments in evolutionary biology, where the null hypothesis is that random genetic drift is responsible for some trait distribution. Some argue that we cannot reject the null drift hypothesis. However, this does not give you warrant to accept the null. In order to argue FOR drift, you would need to consider the selection hypothesis as the null and determine whether it can be rejected. Does that sound correct?

  11. It is correct to say that rejecting the null implies accepting the alternative. But the null needs to be as specific as possible in order to be effective. That is, you want to be able to reject the null based on experimental data, and this is more readily done when the null is specific. But the alternative is everything the null isn’t, so you may need several experiments that disprove several nulls in order to build up evidence for the thing you really want to show. I’m not familiar enough with evolutionary biology to comment on the specifics, but the nulls you’ve described sound too broad to be able to disprove in a scientific experiment. I like to think of each scientific experiment as contributing to the incremental progress towards a better understanding of the natural world.

  12. The analogy to English or Napoleonic Law is a bit strained; in many scientific situations there’s no need to state unambiguous guilt/innocence, or truth/falsehood. While still imperfect, a better analogy might be to Scots Law, which has a third verdict of ‘not proven’;

    Formalizing what to do about double jeopardy under this legal system mirrors the difficulties of sampling to a foregone conclusion present in some statistical frameworks.

  13. Except the analogy is to hypothesis testing, in which there are two, not three outcomes: you either reject the null, or you fail to reject it. A verdict of not guilty does not prove innocence, just like failing to reject the null does not prove the null. In Scots law this would have been the “not proven” verdict, but the “not guilty” verdict was introduced in effort to strengthen a claim of innocence. If you want to make that comparison to hypothesis testing, then you need to consider power or equivalence tests. I’d say that’s stretching the analogy much further than was intended in this discussion.

  14. I’ve been researching hypothesis testing for about a year now, and this is the most enlightening article I’ve read yet. I actually found it by coming to my own conclusion that hypothesis testing is about contradicting the null hypothesis, and then looking up the keywords <> in google, and jackpot!

    I specifically am interested in setting the null and alternative that the mean performance of a system is above or below some set threshold, when being above is bad. We struggle over whether to set the null to: the system performs below the threshold (good), and the alternative is it does not; or set the null to the system performs above the threshold (bad), and the alternative that it is actually below.

    Since we are in the business of proving when a system is performing above the threshold (bad performance), your article indicates we are correct in setting the null as assuming the system performs below (good) the threshold, so that we can prove the alternative by contradiction.


  15. […] and Strategies,” International Journal of Nursing Studies, 47, pp. 1451–1458. 3  See this source for a thorough discussion of the problem of proving the null hypothesis. 4  Wainer, H., 2009, […]

  16. I got into an argument with my boss today who believes that if you fail to reject the null, you have by default proved it, because you have a bunch of data that is consistent with the null.

    I think the correct response, based on my understanding of your excellent article, is that the entire test is constructed in such a way as to set a standard of how much contradiction is needed to be able reject the null, and it is NOT constructed in a way that provides criteria for accepting the null. So even though you have data that is consistent with the null, your test does not establish criteria for how much agreement is enough to say you have “proved” it. Does that sound about right?

  17. I wouldn’t even say that it is “consistent with the null”. All you can really say is that there is no strong evidence to reject it. You can find more good info in the .

  18. […] In the meantime … if you want to know how badly I messed up the stats stuff, read this article – […]

  19. […] non awareness of basic fallacies like “argument from Ignorance” one can claim that failing to disprove the null actually “proves” it. The sneaky tricks available to the cunning dialectician are […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: