There’s a common misconception regarding error bars: overlap means no statistical significance. Checking statistical significance is not the only relevant piece of information that you can get from error bars (otherwise what would be the point) but it’s the first thing people look for when they see them in a graph. Another common misconception is that error bars are always relevant, and should therefore always be present in a graph of experimental results. If only it were that simple.
Who’s laughing now
A professor of psychology was criticized recently when he posted an article online with a graph that did not include error bars. He followed up with poll to see if readers understood error bars (most didn’t), and then posted an article about how most researchers don’t understand error bars. He based his post on a relatively large study (of almost 500 participants) that tested researchers that had published in psychology, neuroscience, and medical journals.
One of the articles cited in the study is Inference by Eye: Confidence Intervals and How to Read Pictures of Data [PDF] by Cumming and Finch. In it the authors describe some pitfalls relating to making inferences from error bars (for both confidence intervals and standard errors). And they describe rules of thumb (what the authors call rules of eye, since they are rules for making visual inferences). But note the fine-print: the rules are for two-sided confidence intervals on the mean, with a normally distributed population, used for making single inferences.
Before you can judge error bars, you need to know what they represent: a percent confidence interval, standard error, or standard deviation. Then you need to worry about whether the data is independent (for between-subject comparisons), or paired (such as repeated tests, for within-subject comparisons), and the reason error bars are being reported (for between-subject comparisons, a meta-analysis in which results are pooled, or just to confuse). And these points are not always made clear in figure captions.
For paired or repeated data, you probably don’t care about the error bars on an independent variable. For example, confidence intervals on the means are of little value for visual inspections—you want to look at the confidence interval on the mean of the differences (which depends on correlation between the confidence intervals on the individual means, which can’t be determined visually). In other words error bars on the individual measurements probably shouldn’t be there since they’re misleading.
Rules of thumb
For independent means, error bars representing 95% confidence intervals can overlap and still be statistically significant at the 5% level. Assuming normality, the overlap can be as much as one quarter of the average length of the two intervals. For statistical significance at the 1% level the intervals should not overlap. However these general rules only apply to sample sizes greater than 10, and the confidence intervals can’t differ in length by more than a factor of two.
For independent means and error bars representing standard errors, there should be a gap between the error bars that is at least equal to the average of the two standard errors for statistical significance at the 5% level. This gap has to be at least double for statistical significance at the 1% level. But it’s probably easier to remember that doubling the length of the standard error bars will give you about a 95% confidence interval (from which you can then apply the rules from the previous paragraph). Again, these rules only apply for samples sizes greater than 10.
It’s suggested that some researches may prefer to use standard error bars because they are shorter, and that the researchers are therefore “capitalizing on their readers’ presumed lack of understanding” of error bars. And recall that there is no standard for error bars (even the percent confidence interval can vary). So the responsibility is yours, as the reader, to be vigilant and check the details. Of course, if you’re the one reporting on the data, you should be clear and honest about your results and their implications (directly in the figure captions).
A final note about other information you can get from error bars. The JMP blog posted an article about what you can use error bars for (where I first learned of the discussion, actually), using different types of error bars depending on the purpose (namely, representing variation in a sample of data, uncertainty in a sample statistic, and uncertainty in several sample statistics). It’s a topic onto itself but it’s interesting to see the different ways you can display the (more or less) same information to get specific ideas across. And that’s the point: error bars are useful when they convey useful information (in a relatively simple way).