Nonparametric Bayes Modeling of Multivariate Categorical Data

J Am Stat Assoc. 2012 Jan 1;104(487):1042-1051. doi: 10.1198/jasa.2009.tm08439.

Abstract

Modeling of multivariate unordered categorical (nominal) data is a challenging problem, particularly in high dimensions and cases in which one wishes to avoid strong assumptions about the dependence structure. Commonly used approaches rely on the incorporation of latent Gaussian random variables or parametric latent class models. The goal of this article is to develop a nonparametric Bayes approach, which defines a prior with full support on the space of distributions for multiple unordered categorical variables. This support condition ensures that we are not restricting the dependence structure a priori. We show this can be accomplished through a Dirichlet process mixture of product multinomial distributions, which is also a convenient form for posterior computation. Methods for nonparametric testing of violations of independence are proposed, and the methods are applied to model positional dependence within transcription factor binding motifs.

Keywords: Bayes factor; Dirichlet process; Goodness-of-fit test; Latent class; Mixture model; Motif data; Product multinomial; Unordered categorical.

Publication types

  • Research Support, N.I.H., Extramural