Regression analysis of biomedical research data based on a repeated measure or cluster sample

Ann Acad Med Singap. 1996 Jan;25(1):129-33.

Abstract

Research in biology and medicine often entails estimating the effect of an exposure variable X on a response variable Y from a cluster sample, that is, where X and Y may be measured repeatedly from the same subject (cluster), or X and Y may be measured from two or more subjects based on some related grouping such as a litter or household (cluster). Typically, regression analysis of the data is performed ignoring the subject or grouping (cluster) identity. This analytical approach has two drawbacks. First, the statistical inference of the regression coefficient of Y on X (beta) ignored cluster identity will likely be biased. More serious though, is that beta ignored cluster identity will likely be quite discrepant from the average within-cluster beta. It is the latter that is of relevance to the research question. Indeed it is not clear what beta ignored cluster identity really conveys. We describe a multiple regression model for the analysis of data from a cluster sample. The model treats the cluster as a nominal confounding variable to be adjusted. The idea is to represent the cluster by a set of dummy variables to be included as explanatory variables in the regression model. This model gives the average within-cluster beta with valid statistical inference. Numeric examples were used to illustrate the application of the dummy variable multiple regression model. This statistical method can be implemented by any software package that includes multiple regression, and virtually all commercial packages include this procedure.

MeSH terms

  • Cluster Analysis*
  • Data Interpretation, Statistical*
  • Humans
  • Regression Analysis*
  • Research / statistics & numerical data*