Decomposition of variation of mixed variables by a latent mixed Gaussian copula model

Biometrics. 2023 Jun;79(2):1187-1200. doi: 10.1111/biom.13660. Epub 2022 Mar 30.

Abstract

Many biomedical studies collect data of mixed types of variables from multiple groups of subjects. Some of these studies aim to find the group-specific and the common variation among all these variables. Even though similar problems have been studied by some previous works, their methods mainly rely on the Pearson correlation, which cannot handle mixed data. To address this issue, we propose a latent mixed Gaussian copula (LMGC) model that can quantify the correlations among binary, ordinal, continuous, and truncated variables in a unified framework. We also provide a tool to decompose the variation into the group-specific and the common variation over multiple groups via solving a regularized M-estimation problem. We conduct extensive simulation studies to show the advantage of our proposed method over the Pearson correlation-based methods. We also demonstrate that by jointly solving the M-estimation problem over multiple groups, our method is better than decomposing the variation group by group. We also apply our method to a Chlamydia trachomatis genital tract infection study to demonstrate how it can be used to discover informative biomarkers that differentiate patients.

Keywords: Kendall's τ; high-dimensional matrix estimation; latent Gaussian copula model; variation decomposition.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Biological Variation, Individual*
  • Biomedical Research* / statistics & numerical data
  • Chlamydia Infections
  • Chlamydia trachomatis
  • Computer Simulation
  • Humans
  • Normal Distribution
  • Reproductive Tract Infections