Luk Arbuckle

Posts Tagged ‘white paper’

Statistical concepts in presenting data

In data display on 18 February 2009 at 8:29 pm

Finally someone has written a text something like Tufte’s Visual Display of Quantitative Information but specifically for statistics.  Rafe M. J. Donahue, of  Biomimetic Therapeutics and Vanderbilt University Medical Center,  gave a seminar course on presenting statistical data at a meeting of the American Statistical Associtation (ASA) in June 2008, and will be giving a similar course  in April 2009 (as part of a continuing education program of the ASA).  I learned of his course in a recent blog post at Statistical Modeling, Causal Inference, and Social Science.

The current version of Donahue’s text is a 100 pages [PDF], but well worth a casual read (it’s not as bad as it sounds, as a lot of those pages are dedicated to visual displays of the ideas he is describing).  If you enjoy reading Tufte’s opinions on the topic of displaying data, and you have to create charts and diagrams of statistical data, then you should enjoy Donahue’s writing as well.  Reading Tufte a couple of years ago had a tremendous impact on my view of visual displays.  But the focus here is in on statistical data.

The two fundamental acts of science, description and comparison, are facilitated via models. By models, we refer to ideas and explanations that do two things: describe past observations and predict future outcomes. […] Statistical models, then, allow us to describe past observation and predict future within the confines of our understanding of probability and randomness. Statistical models become tools for understanding sources of variation.

Show the atoms; show the data.

Show the atoms; show the data.

A summary of some principles presented by Donahue:

  • The exposition of the distribution is paramount.
  • Show the atoms; show the data.
  • Each datum gets one glob of ink.
  • Erase non-data ink; eliminate redundant ink.
  • Take time to document and explain.
  • The data display is the model.
  • Avoid arbitrary summarization, particularly across sources of variation.
  • Reward the viewer’s investment in the data display.
  • In viewing CDFs, steepness equals dataness.
  • Plot cause versus effect.
  • Typically, color ought be used for response variables, not design variables—but not always.
  • We understand the individual responses by comparing them to a distribution of like individuals.
  • Data presentation layouts and designs should be driven by intended use.
  • Time series make fine accounting but poor scientific models.
One glob of ink.

Each datum gets one glob of ink.

Naturally Donahue was also influenced by Tufte.  As he says, “the idea of analysis  is to understand the whole by decomposing into component parts.”  And he therefore reminds the reader of Tufte’s principles of analytical design: 

  • Show comparisons, contrasts, differences.
  • Show causality, mechanism, structure, explanation.
  • Show multivariate data; that is, show more than 1 or 2 variables.
  • Completely integrate words, numbers, images, diagrams.
  • Thoroughly describe the evidence. 
  • Analytical presentations ultimately stand or fall depending on the quality, relevance, and integrity of their content.
Take time to document and explain.

Take time to document and explain.

Confidence, prediction, and tolerance intervals explained

In estimation on 25 May 2008 at 10:00 am

JMP, a business division of SAS, has a short seven page white paper that describes the differences between confidence, prediction, and tolerance intervals using a simple manufacturing example. Formulas are provided along with instructions for using JMP menus to calculate the interval types from a data set.

Statistical intervals help us to quantify the uncertainty surrounding the estimates that we calculate from our data, such as the mean and standard deviation. The three types of intervals presented here—confidence, prediction and tolerance—are particularly relevant for applications found in science and engineering because they allow us to make very practical claims about our sampled data.

Related posts:
That confidence interval is a random variable
No one understands error bars

It’s not an eye-opening read per se, but it’s nonetheless important to understand the nuances between the different interval types. The table provided at the end, with an interpretation of each interval type for the example provided, is a good summary of the ideas presented.