Luk Arbuckle

Posts Tagged ‘inference’

Centroid estimation in discrete high-dimensional spaces

In estimation on 1 June 2008 at 11:15 pm

A point estimate of a parameter is a single number intended to be as close as possible to the true value of the parameter. It’s unlikely to be exactly equal to the parameter it’s trying to estimate—although it is the single most probable solution—but it’s an important starting point for constructing a confidence interval.

The method of maximum likelihood is a general method of obtaining single point estimators with three desirable properties (at least in low-dimensional continuous spaces):

  • consistency (convergence in probability)
  • normality (normally distributed about the estimate)
  • efficiency (minimum variance)

But these properties only hold asymptotically (i.e., they’re properties that exist in the limit), and only properly for continuous variables—they are not achieved for high-dimensional discrete unknowns.

A discrete high-dimensional sample space is partitioned into so many parts that a single point estimator will likely have very low probability. In previous work it was hoped that the single point estimator would be surrounded by similar solutions that would together form a greater “probability mass”. Examples exist, however, that demonstrate that this is not always the case.

Where’s the point?
In a publicly available article published in the Proceedings of the National Academy of Sciences, researchers Luis Carvalho and Charles Lawrence at Brown University discuss a class of “centroid” estimators they developed to be more representative of the information contained in discrete high-dimensional sample spaces. The centroid estimator is proven to minimize differences between the parameter and the estimate (for important loss functions), and to be the closest point to the mean.

The authors highlight published results that suggest these alternative estimators offer improved representation of data in practice, and provide some interesting examples from computational biology. They warn the reader, however, that only a few applications have been studied so far, and feasability has not been shown for all cases.

In the concluding remarks the authors share some insight into future challenges:

Rapid improvements in data acquisition technologies promise to continue to dramatically increase the pool of data in many fields. Although these data will be of great benefit, they also have opened a new universe of high-dimensional inference and prediction problems that will likely provide major data analytic challenges in the coming decades. Among these is the development of point estimators in discrete spaces that are the focus of the centroid estimators developed here.

But the more general point estimation challenge is to find one or a small number of feasible solutions among the many in the ensemble that is by some appropriate measure representative of the full ensemble and suitable for the data structural features of the solution space. These new high-dimensional data and unknowns will also almost certainly force a reexamination of extant approaches to interval estimation, hypothesis tests, and predictive inference.

Advertisements

Econometrics lit review in video

In mixed on 27 May 2008 at 12:45 am

The National Bureau of Economic Research—a private, nonprofit, nonpartisan research organization—has made public an eighteen-hour workshop from it’s Summer Institute 2007: What’s New in Econometrics?  Included are lecture videos, notes, and slides from the series.

The lectures cover recent advances in econometrics and statistics.   The topics include (in the order presented):

  • Estimation of Average Treatment Effects Under Unconfoundedness 
  • Linear Panel Data Models
  • Regression Discontinuity Designs
  • Nonlinear Panel Data Models
  • Instrumental Variables with Treatment Effect Heterogeneity: Local Average Treatment Effects
  • Control Function and Related Methods
  • Bayesian Inference
  • Cluster and Stratified Sampling
  • Partial Identification
  • Difference-in-Differences Estimation
  • Discrete Choice Models
  • Missing Data
  • Weak Instruments and Many Instruments
  • Quantile Methods
  • Generalized Method of Moments and Empirical Likelihood

The speakers explain the material well, including some practical pros and cons to the methods presented.  The slides are, however, typically academic: packed with content and equations, with little to support the speaker.  In a way it’s expected, but surprising given that lecture notes are provided.

It takes a bit of time to get into the talks, but once you do there’s lots to learn.  I suggest two open browser windows: one for the videos, one for the slides.  But avoid the temptation to read the slides—the speakers explain the material well and you’ll pick up quite a bit if you can focus on what they’re saying while you stare lovingly at the equations.

Special thanks to John Graves at the Social Science Statistics Blog for posting a notice about the series.