Luk Arbuckle

Posts Tagged ‘clt’

Irrational fear of non-normality

In models on 6 July 2008 at 9:42 pm

What do you do if your model errors are not normally distributed?  If you intend to use statistical procedures that assume normally distributed residuals, you may think of “agonizing over normal probability plots and tests of residuals”.  Some leaders at the JMP division of SAS, however, think it might be a waste of time.

The central limit theorem assures us that even if the data are not normal, mean-like statistics still approach normal distributions as the sample size increases. With small samples, these statistics may not be nearly normal, but we don’t have a big enough sample to tell. 

They don’t say to drop the use of tests of normality and normal probability plots of residuals, as they have their place.  But their simulations suggest that these tests are unnecessary in most cases (see their article on page 9 of the SPES/Q&P Newsletter for the details).  In genral, they

recommend plotting residual values versus predicted values, by case order, or versus other variables.  Rather than distributional testing, look for graphical anomalies, especially outliers or patterns that might be a clue to some hidden structure. 

Although this is good advice, you may not get buy-in with everyone you work with.  No analyst wants to be in a position of having their work questioned when assumptions are found to have been violated.  Sometimes it’s easier to just do what is expected, or demanded, although that’s never been my style—better keep this one in my back pocket, just in case.