# Luk Arbuckle

## Least squares explained simply, and in video

In regression on 13 November 2008 at 5:16 pm

A short five minute video has been created explaining least squares with JMP.  The author, Lee Creighton, uses a very simple example of fitting a line to data, and considers different measures of a “best” fit (not to ruin the punch line, but least squares has a global minimum).  On the right hand side of the applet is an error bar (for the particular measure being considered) that changes as the line is changed for the set of data points.

Least squares considered with, well, squares!

What I like about the visualization is the that the squared errors are displayed as actual squares.  It seems silly, but I had never thought of squared errors in a literal, geometric way.  I always thought of them as scalars, and nothing more.  But showing actual squares makes for a much better visual representation.

It’s likely we’ll see more video tutorials from Lee Creighton on vimeo, as he’s added a few new items already (although there’s been no mention of these at the JMP blog, at least not yet).  Of course, this assumes people find them useful and he receives some positive feedback (and I hope he does, as video tutorials in statistics are rare).

Note that I chose not to embed the video into my blog post because I have no control over the formatting, and to ensure Creighton gets clicks to his blog post (give credit where credit is due).  Click on the image to get to the video.

## Irrational fear of non-normality

In models on 6 July 2008 at 9:42 pm

What do you do if your model errors are not normally distributed?  If you intend to use statistical procedures that assume normally distributed residuals, you may think of “agonizing over normal probability plots and tests of residuals”.  Some leaders at the JMP division of SAS, however, think it might be a waste of time.

The central limit theorem assures us that even if the data are not normal, mean-like statistics still approach normal distributions as the sample size increases. With small samples, these statistics may not be nearly normal, but we don’t have a big enough sample to tell.

They don’t say to drop the use of tests of normality and normal probability plots of residuals, as they have their place.  But their simulations suggest that these tests are unnecessary in most cases (see their article on page 9 of the SPES/Q&P Newsletter for the details).  In genral, they

recommend plotting residual values versus predicted values, by case order, or versus other variables.  Rather than distributional testing, look for graphical anomalies, especially outliers or patterns that might be a clue to some hidden structure.

Although this is good advice, you may not get buy-in with everyone you work with.  No analyst wants to be in a position of having their work questioned when assumptions are found to have been violated.  Sometimes it’s easier to just do what is expected, or demanded, although that’s never been my style—better keep this one in my back pocket, just in case.

## Confidence, prediction, and tolerance intervals explained

In estimation on 25 May 2008 at 10:00 am

JMP, a business division of SAS, has a short seven page white paper that describes the differences between confidence, prediction, and tolerance intervals using a simple manufacturing example. Formulas are provided along with instructions for using JMP menus to calculate the interval types from a data set.

Statistical intervals help us to quantify the uncertainty surrounding the estimates that we calculate from our data, such as the mean and standard deviation. The three types of intervals presented here—confidence, prediction and tolerance—are particularly relevant for applications found in science and engineering because they allow us to make very practical claims about our sampled data.

 Related posts: That confidence interval is a random variable No one understands error bars

It’s not an eye-opening read per se, but it’s nonetheless important to understand the nuances between the different interval types. The table provided at the end, with an interpretation of each interval type for the example provided, is a good summary of the ideas presented.