# Luk Arbuckle

## On time series and stochastics

In time series on 3 December 2008 at 5:33 pm

In reading the paper On Time Series Analysis of Public Health and Biomedical Data (subscription required for the PDF), described in the last post, I was introduced to an interpretation of a time series that I was not familiar with, that is, in terms of a stochastic process.  I remember being told in my course on time series analysis that a stochastic process and time series were synonymous for our purposes—although more may have been said at the time, which I’ve since forgotten—but there’s obviously more to it than that.

A time series is a single observation of a possibly infinite collection of time series.  In other words the time series itself can be viewed as a random variable, and within that time series is a realization of a collection of random variables ordered in time.  The possibly infinite sequence of random variables ordered in time is a stochastic process.  When we consider a stochastic process we are concerned with the probability model for the individual random variables, and also combinations of them.

A will to be independent
Making inferences from a time series is making inferences from a single realization of a stochastic process, that is, a single observation at each time.  The idea of stationarity—that statistical properties of the time series do not depend on time—is used to develop probability based theory specific to time series analysis. Basically, the relative time difference between variables in a stationary time series will change the probability distribution, whereas time shifting will not.  The assumption of stationarity implies that the dependence between variables decreases with increasing time separation (which leads to a discussion of “auto”-correlation, which I won’t describe here).  Therefore, more (nearly independent) information will be accumulated the longer a series is followed.

Longitudinal data are repeated measures for short periods, resulting in many realizations of short time series.  We assume that the time series are independent, and that repeated observations lead to zero correlation with increasing time separation (which is stronger than stationarity, especially for short time series).  In this case we look to increase the number of time series instead of the number of observations for a single time series.  This is a topic onto itself (one I’m not currently familiar with), and therefore only mentioned briefly in the paper.  What I found particularly interesting was the idea of bootstrapping a time series (based on splitting a time series into several shorter pieces), something I’ll need to look into further.

## Time series analysis of public health data

In time series on 28 November 2008 at 3:45 pm

Since I’m finishing a course in time series analysis I decided to look for applications in biostatistics (an area I’m interested in).  In my search I found a paper On Time Series Analysis of Public Health and Biomedical Data (subscription required for the PDF).  When I downloaded the paper I thought it was a literature review, but it’s really a gentle introduction to time series analysis for health professionals (although at times the authors use terminology that I think will confuse more than enlighten).

On independence and applications
An important point made in the article is that time series analysis should be used instead of standard regression analysis when the observations (or outcome measures) are not independent.  Otherwise inferences will not be valid (since independence is a key assumption in standard regression).  Time series models, on the other hand, take correlation between observations into account (resulting in valid and more efficient inferences).  An example is given wherein standard regression would imply a downward trend in birth rate (for their particular data, recording births in an area for about three years), whereas time series methods do not allow for such a conclusion.

The authors point out the increasing use of time series analysis in health research, as evidenced by a search on PubMed.  Some application areas mentioned in the paper:

• gene expressions to describe molecular and cellular processes
• physiologic studies, in general (including image analysis for PET or fMRI, as well as some areas of critical care medicine)
• basic epidemiologic studies of infectious and chronic diseases
• environmental epidemiology
• health services research (to evaluate interventions)
• demographic analyses of population health

Although these examples of application areas for time series analysis are interesting, it doesn’t go nearly far enough in the details.  I would like to know more about how, specifically, time series analysis is being used to advance health research.  This is one reason I was originally looking for a literature review.  Another reason is that I would like to figure out the areas in which improvements in the theory and methods are still needed (i.e., brain storming for a research topic).   Maybe that’s asking too much of a single paper, but I’ve read literature reviews in the past and they usually cover such ground.  I’ll have to try and find something in a biostats journal.