In the context of logical reasoning, and using Bayesian probability, you can argue that absence of evidence is, in fact, evidence of absence. Namely, not being able to find evidence for something changes your thinking and can result in you reversing your original hypothesis entirely. For example, failing to find evidence that some medical treatment works, you may begin to think that it doesn’t work. Maybe it’s a placebo. You could, therefore, decide to change your hypothesis and look to create an experiment disproving it’s effectiveness. Of course, there are no “priors”, in the Bayesian sense, in the frequentist interpretation of hypothesis testing. But, just the same, what does this say about the maxim used in statistical hypothesis testing, that absence of evidence is not evidence of absence? Nick Barrowman has an interesting post on the topic, and I wanted to participate in the discussion:
I interpret “absence of evidence is not evidence of absence” (in the context of hypothesis testing) to mean “failing to reject the null is not equivalent to accepting the null.” I’m thinking of the null hypothesis of “no treatment effects”. You don’t have significant evidence to reject the null, and therefore an absence of evidence of treatment effects, but this is not the same thing as saying you have evidence of no treatment effects (because of the formulation of hypothesis testing, flawed as it may be).
One point, which I believe you are alluding to, is that an equivalence test would be more appropriate. But I’ve heard some statisticians and researchers try and argue that they could use retrospective power to “prove the null” when they are faced with non-significant results. See Abuse of Power [PDF] (this paper was the nail in the coffin, if you will, in a previous discussion I was having with a group of statisticians).
I believe the maxim is simply trying to emphasize that the p-value is calculated having assumed the null, and therefore can’t be used as evidence for the null (as it would be a circular argument). Trying to make more out of the maxim than this may be the sticking point. It’s too simple, and therefore flawed when taken out of this limited context.
I agree with your previous post. If I’m not mistaken, one point was that failing to reject the null means the confidence interval contains a value of “no effect”. But there could still be differences of practical importance, and so failing to reject the null is not the same as showing there’s no effect. The “statistical note” from the BMJ, Absence of evidence is not evidence of absence, seems to be saying the same thing: absence of evidence of a difference is not evidence that there is no difference. Or, absence of evidence of an effect is not evidence of no effect. Because you can’t prove the null using a hypothesis test (you instead need an equivalence test).
I entirely agree with Nick that confidence intervals are more clear. We can’t forget that hypothesis testing, although constructed like a proof by contradiction, has uncertainty (in the form of Type I errors, rejecting the null when it is true, and Type II errors, failing to reject the null when it is false). It’s interpretation is, therefore, muddied by uncertainty and inductive reasoning (I had actually forgotten what Nick had written with regards to Popper and Fisher when I was commenting). To be honest, my head is still spinning trying to make sense of all this, but it certainly is an interesting topic.