A latent class distance association model for cross-classified data with a categorical response variable

Br J Math Stat Psychol. 2014 Nov;67(3):514-40. doi: 10.1111/bmsp.12038. Epub 2014 Mar 24.

Abstract

In this paper we propose a latent class distance association model for clustering in the predictor space of large contingency tables with a categorical response variable. The rows of such a table are characterized as profiles of a set of explanatory variables, while the columns represent a single outcome variable. In many cases such tables are sparse, with many zero entries, which makes traditional models problematic. By clustering the row profiles into a few specific classes and representing these together with the categories of the response variable in a low-dimensional Euclidean space using a distance association model, a parsimonious prediction model can be obtained. A generalized EM algorithm is proposed to estimate the model parameters and the adjusted Bayesian information criterion statistic is employed to test the number of mixture components and the dimensionality of the representation. An empirical example highlighting the advantages of the new approach and comparing it with traditional approaches is presented.

Keywords: BIC; Distance association model; EM algorithm; clustering; latent class analysis; mixture distribution.

MeSH terms

  • Algorithms*
  • Analysis of Variance*
  • Bayes Theorem
  • Cluster Analysis*
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Humans
  • Models, Statistical
  • Netherlands
  • Politics
  • Statistics as Topic