I came across the use of “statistically relevant” in something I was reading online and, since I had never heard of it before, decided to look it up. But it’s usage varies. Some use it to mean statistically significant, which seems wrong since we have a precise definition of that, and in other cases I’m not sure what they mean, exactly.

I asked a few people in applied statistics and they had never seen the use of statistically relevant, or come across a formal definition. A long conversation ensued as we attempted to figure out its precise meaning. The terms practical significance came up, meaning something that is statistically significant and also of practical use. Medical or health scientists sometimes call this biological significance. The terms practical (or biological) relevance also came up for the case that something is not statistically significant but still practical.

**Enter philosophy**

As it happens, the definition of statistical relevance is from philosophy (bear with me). The property *C* is statistically relevant to *B* within *A* if and only if *P(B, A&C)* does not equal *P(B, A-C)*. The definition is then used in combination with a partitioning of *A* via a property *C* to create a model that states that if *P(B, A&C) > P(B, A)* then *C* explains *B*. It’s a model trying to define what constitutes a “good” explanation.

We can say that “copper (*C*) is statistically relevant to things that melt at 1083 degrees Celsius (*B*) within the class of metals (*A*)”. Considering the definition, we have that *P(B, A&C) = 1* (it melts at 1083 and is copper) and, given that no other metal melts at 1083 degrees, *P(B, A-C) = 0* (it melts at 1083 and is a metal that is not copper), which implies statistical relevance.

Note that property *C* in the above example partitions the reference set *A* with *(A&C)* and *(A-C)*, and *P(B, A&C) = 1 > P(B, A)* (since copper is the only metal that melts at 1083, and there are currently 86 known metals, the probability that it melts at 1083 and is metal is 1/86). Therefore, using this model of a good explanation, we can say that it melts at 1083 degrees because it is copper (or, following the language in the model, that it is copper explains why it melts at 1083).

**Correlation is not causation**

What I’ve found is that people familiar with this definition from philosophy use “*A* is statistically relevant to *B*” to mean two things: (i) *A* is related to *B* (correlated), (ii) *B* is explained by *A* (causal). The definition supports (i), but I believe they’re using it incorrectly in (ii) with the model of a good explanation in mind (which, by the way, is by a researcher named Salmon).

I’m no philosophy major, but I think it’s safe to say that the terms statistically relevant should not be confused with statistically significant. Extremely low probability events can be statistically relevant, and since it’s not saying anything more than “there’s a slight correlation”, it’s not really saying all that much in the context of statistics. Terms such as practical significance, or practical relevance, seem appropriate in the contexts described above, but avoid using statistically relevant unless you, and your readers, know the definition.

I have just found your blog coming from here (http://mastersinhealthinformatics.com/2009/top-50-health-informatics-blogs/) and I’m really impressed. Congratulations it’s awesome.

23 December 2009at12am