# Luk Arbuckle

## Absence of evidence is evidence of absence?

In hypothesis testing on 25 January 2009 at 5:15 pm

In the context of logical reasoning, and using Bayesian probability, you can argue that absence of evidence is, in fact, evidence of absence.  Namely, not being able to find evidence for something changes your thinking and can result in you reversing your original hypothesis  entirely.   For example, failing to find evidence that some medical treatment works, you may begin to think that it doesn’t work.  Maybe it’s a placebo.  You could, therefore, decide to change your hypothesis and look to create an experiment disproving it’s effectiveness.  Of course, there are no “priors”, in the Bayesian sense, in the frequentist interpretation of hypothesis testing.  But, just the same, what does this say about the maxim used in statistical hypothesis testing, that absence of evidence is not evidence of absence?  Nick Barrowman has an interesting post on the topic, and I wanted to participate in the discussion:

I interpret “absence of evidence is not evidence of absence” (in the context of hypothesis testing) to mean “failing to reject the null is not equivalent to accepting the null.” I’m thinking of the null hypothesis of “no treatment effects”. You don’t have significant evidence to reject the null, and therefore an absence of evidence of treatment effects, but this is not the same thing as saying you have evidence of no treatment effects (because of the formulation of hypothesis testing, flawed as it may be).

One point, which I believe you are alluding to, is that an equivalence test would be more appropriate. But I’ve heard some statisticians and researchers try and argue that they could use retrospective power to “prove the null” when they are faced with non-significant results. See Abuse of Power [PDF] (this paper was the nail in the coffin, if you will, in a previous discussion I was having with a group of statisticians).

I believe the maxim is simply trying to emphasize that the p-value is calculated having assumed the null, and therefore can’t be used as evidence for the null (as it would be a circular argument). Trying to make more out of the maxim than this may be the sticking point. It’s too simple, and therefore flawed when taken out of this limited context.

I agree with your previous post. If I’m not mistaken, one point was that failing to reject the null means the confidence interval contains a value of “no effect”. But there could still be differences of practical importance, and so failing to reject the null is not the same as showing there’s no effect. The “statistical note” from the BMJ, Absence of evidence is not evidence of absence, seems to be saying the same thing: absence of evidence of a difference is not evidence that there is no difference. Or, absence of evidence of an effect is not evidence of no effect. Because you can’t prove the null using a hypothesis test (you instead need an equivalence test).

I entirely agree with Nick that confidence intervals are more clear.   We can’t forget that hypothesis testing, although constructed like a proof by contradiction, has uncertainty (in the form of Type I errors, rejecting the null when it is true, and Type II errors, failing to reject the null when it is false).  It’s interpretation is, therefore, muddied by uncertainty and inductive reasoning (I had actually forgotten what Nick had written with regards to Popper and Fisher when I was commenting).  To be honest, my head is still spinning trying to make sense of all this, but it certainly is an interesting topic.

1. I know this is an old post which I just came across yesterday, but there is so much zany about it that I cannot resist at least saying something.
“I believe the maxim is simply trying to emphasize that the p-value is calculated having assumed the null, and therefore can’t be used as evidence for the null (as it would be a circular argument).”
Put aside the “no evidence against is not evidence for” for the moment. Calculating the p value does not assume the null any more than testing Einstein’s theory by drawing out a prediction from it, and seeing if it holds up, assumes Einstein’s theory. The p-value is calculation by hypothetically assuming the data arose from a universe or population correctly described as in the null. That is one calculates the probability of the event “t(X) > t(x-obs)” —where t(x) is the test statistic—under the assumption that x came from a population as described in the null.
If calculating the null literally assumed the null, so as to be open to this circularity charge, then how would a small p-value ever count as evidence against the null?

Note that in logic, (if A then not-A) implies not-A. Imagine A asserts: the null is true. Likewise, when we hypothetically assume the null, and find we keep getting results improbably far from what the null hypothesizes, then we infer not-A (i.e., while the null is not deductively falsified, it is statistically falsified.)
But my real point just now concerns the erroneous assertion about the p-value assuming the truth of the null.

This use of something called “observed power” is one of the most illogical messes I’ve ever seen. Please point me to places where people seriously use this. I’ve heard of it only a few times before.

2. As explained in a previous post, a hypothesis test is a form of proof by contradiction. The proof against the null is the probability of getting a test statistic as extreme as the evidence you collect. But that probability is determined by the null hypothesis, which is the assumption you are trying to contradict. Your Einstein example does not provide a null hypothesis to be rejected, and is therefore not equivalent. Also, you can’t “calculate the null” in frenquentist statistics; this is a common misunderstanding. You can find all kinds of material online about statistical power, even books.

3. Yes, but say I am a researcher, who is investigating (among other things) whether component x1 of treatment X, results in isolation in some of the demonstrated undesirable side effect y of X too. If previous research in side effects of X suggests that x1 does not contribute to y at all, and elimination of x1 as a cause of y would be a welcome result, confirming the wisdom of looking into x2, x3 and so on, and the outcome of the tests is y or non-y after x1, how can I find and present evidence for the absence?

4. @teddy1975 You build evidence by conducting large, high-quality trials that don’t find an effect. You consider equivalence tests, and regression modelling. If you have a specific study in mind, I would search google scholar for similar work to see how what tools they used in their analyses.