# Luk Arbuckle

## That confidence interval is a random variable

In estimation on 29 September 2008 at 7:58 pm

People often confuse the meaning of the probability (or confidence) associated with a confidence interval—the probability is not that the parameter is in a particular interval, but that the intervals in repeated experiments will contain the parameter.  No wonder people get confused, as it sounds like the same thing if you’re not paying close attention to the wording.  Even then I’m not sure that it’s clear.

Take polling data for elections.  When it’s reported that a political party is currently getting a specified level of support (say 37%), with an accuracy of plus or minus some amount (say 2%), they normally state that the results are true 19 times out of 20 (that’s a 95% confidence level).  This means that if they were to repeat the polling 20 times, the true level of support for that political party would fall within 19 intervals out of 20.  It does not mean that there’s a 95% chance that the true level of support for that political party is within the range of support being quoted (35 to 39%) in that specific poll.

The intervals, they are a changin’
The point is that the probability statement is about the interval, not the parameter.  Let’s say you’re building a confidence interval of the mean. The population mean is an unknown constant, not a random variable.  The random variables are the sample mean and sample variance used to build the interval, which vary between experiments.  In other words it is the interval that varies and which can be considered a “random variable” of sorts.  Once values for the sample mean and sample variance have been calculated for an interval, it’s not correct to make probability statements about the population mean—that would imply that it’s a random variable.  The population mean is a constant that either is or isn’t in the interval.

Another point to keep in mind is that all values in a confidence interval are plausible.  So although support for a political party may be at 37% (as in the previous example), with an accuracy of plus or minus 2% the true level of support at the time of polling could be anything from 35 to 39% (with a confidence of 95%).  And if you want to compare the level of support between two parties, in general you don’t want much overlap (for statistical significance at the 5% level the overlap should be no more than 1% support in our example).

## No one understands error bars

In estimation on 26 September 2008 at 12:04 pm

There’s a common misconception regarding error bars: overlap means no statistical significance.  Checking statistical significance is not the only relevant piece of information that you can get from error bars (otherwise what would be the point) but it’s the first thing people look for when they see them in a graph.  Another common misconception is that error bars are always relevant, and should therefore always be present in a graph of experimental results.  If only it were that simple.

Who’s laughing now
A professor of psychology was criticized recently when he posted an article online with a graph that did not include error bars.  He followed up with poll to see if readers understood error bars (most didn’t), and then posted an article about how most researchers don’t understand error bars.  He based his post on a relatively large study (of almost 500 participants) that tested researchers that had published in psychology, neuroscience, and medical journals.

One of the articles cited in the study is Inference by Eye: Confidence Intervals and How to Read Pictures of Data [PDF] by Cumming and Finch.  In it the authors describe some pitfalls relating to making inferences from error bars (for both confidence intervals and standard errors).  And they describe rules of thumb (what the authors call rules of eye, since they are rules for making visual inferences).  But note the fine-print: the rules are for two-sided confidence intervals on the mean, with a normally distributed population, used for making single inferences.

Pitfalls
Before you can judge error bars, you need to know what they represent: a percent confidence interval, standard error, or standard deviation.   Then you need to worry about whether the data is independent (for between-subject comparisons), or paired (such as repeated tests, for within-subject comparisons), and the reason error bars are being reported (for between-subject comparisons, a meta-analysis in which results are pooled, or just to confuse). And these points are not always made clear in figure captions.

For paired or repeated data, you probably don’t care about the error bars on an independent variable.  For example, confidence intervals on the means are of little value for visual inspections—you want to look at the confidence interval on the mean of the differences (which depends on correlation between the confidence intervals on the individual means, which can’t be determined visually).   In other words error bars on the individual measurements probably shouldn’t be there since they’re misleading.

Rules of thumb
For independent means, error bars representing 95% confidence intervals can overlap and still be statistically significant at the 5% level.  Assuming normality, the overlap can be as much as one quarter of the average length of the two intervals.  For statistical significance at the 1% level the intervals should not overlap.  However these general rules only apply to sample sizes greater than 10, and the confidence intervals can’t differ in length by more than a factor of two.

For independent means and error bars representing standard errors, there should be a gap between the error bars that is at least equal to the average of the two standard errors for statistical significance at the 5% level.  This gap has to be at least double for statistical significance at the 1% level.  But it’s probably easier to remember that doubling the length of the standard error bars will give you about a 95% confidence interval (from which you can then apply the rules from the previous paragraph).  Again, these rules only apply for samples sizes greater than 10.

Constant vigilance
It’s suggested that some researches may prefer to use standard error bars because they are shorter, and that the researchers are therefore “capitalizing on their readers’ presumed lack of understanding” of error bars.  And recall that there is no standard for error bars (even the percent confidence interval can vary).  So the responsibility is yours, as the reader, to be vigilant and check the details.  Of course, if you’re the one reporting on the data, you should be clear and honest about your results and their implications (directly in the figure captions).

A final note about other information you can get from error bars.  The JMP blog posted an article about what you can use error bars for (where I first learned of the discussion, actually), using different types of error bars depending on the purpose (namely, representing variation in a sample of data, uncertainty in a sample statistic, and uncertainty in several sample statistics).  It’s a topic onto itself but it’s interesting to see the different ways you can display the (more or less) same information to get specific ideas across.  And that’s the point: error bars are useful when they convey useful information (in a relatively simple way).

## Centroid estimation in discrete high-dimensional spaces

In estimation on 1 June 2008 at 11:15 pm

A point estimate of a parameter is a single number intended to be as close as possible to the true value of the parameter. It’s unlikely to be exactly equal to the parameter it’s trying to estimate—although it is the single most probable solution—but it’s an important starting point for constructing a confidence interval.

The method of maximum likelihood is a general method of obtaining single point estimators with three desirable properties (at least in low-dimensional continuous spaces):

• consistency (convergence in probability)
• normality (normally distributed about the estimate)
• efficiency (minimum variance)

But these properties only hold asymptotically (i.e., they’re properties that exist in the limit), and only properly for continuous variables—they are not achieved for high-dimensional discrete unknowns.

A discrete high-dimensional sample space is partitioned into so many parts that a single point estimator will likely have very low probability. In previous work it was hoped that the single point estimator would be surrounded by similar solutions that would together form a greater “probability mass”. Examples exist, however, that demonstrate that this is not always the case.

Where’s the point?
In a publicly available article published in the Proceedings of the National Academy of Sciences, researchers Luis Carvalho and Charles Lawrence at Brown University discuss a class of “centroid” estimators they developed to be more representative of the information contained in discrete high-dimensional sample spaces. The centroid estimator is proven to minimize differences between the parameter and the estimate (for important loss functions), and to be the closest point to the mean.

The authors highlight published results that suggest these alternative estimators offer improved representation of data in practice, and provide some interesting examples from computational biology. They warn the reader, however, that only a few applications have been studied so far, and feasability has not been shown for all cases.

In the concluding remarks the authors share some insight into future challenges:

Rapid improvements in data acquisition technologies promise to continue to dramatically increase the pool of data in many fields. Although these data will be of great benefit, they also have opened a new universe of high-dimensional inference and prediction problems that will likely provide major data analytic challenges in the coming decades. Among these is the development of point estimators in discrete spaces that are the focus of the centroid estimators developed here.

But the more general point estimation challenge is to find one or a small number of feasible solutions among the many in the ensemble that is by some appropriate measure representative of the full ensemble and suitable for the data structural features of the solution space. These new high-dimensional data and unknowns will also almost certainly force a reexamination of extant approaches to interval estimation, hypothesis tests, and predictive inference.

## Confidence, prediction, and tolerance intervals explained

In estimation on 25 May 2008 at 10:00 am

JMP, a business division of SAS, has a short seven page white paper that describes the differences between confidence, prediction, and tolerance intervals using a simple manufacturing example. Formulas are provided along with instructions for using JMP menus to calculate the interval types from a data set.

Statistical intervals help us to quantify the uncertainty surrounding the estimates that we calculate from our data, such as the mean and standard deviation. The three types of intervals presented here—confidence, prediction and tolerance—are particularly relevant for applications found in science and engineering because they allow us to make very practical claims about our sampled data.

 Related posts: That confidence interval is a random variable No one understands error bars

It’s not an eye-opening read per se, but it’s nonetheless important to understand the nuances between the different interval types. The table provided at the end, with an interpretation of each interval type for the example provided, is a good summary of the ideas presented.