A transformation-based approach to Gaussian mixture density estimation for bounded data

Biom J. 2019 Jul;61(4):873-888. doi: 10.1002/bimj.201800174. Epub 2019 Apr 14.

Abstract

Finite mixture of Gaussian distributions provide a flexible semiparametric methodology for density estimation when the continuous variables under investigation have no boundaries. However, in practical applications, variables may be partially bounded (e.g., taking nonnegative values) or completely bounded (e.g., taking values in the unit interval). In this case, the standard Gaussian finite mixture model assigns nonzero densities to any possible values, even to those outside the ranges where the variables are defined, hence resulting in potentially severe bias. In this paper, we propose a transformation-based approach for Gaussian mixture modeling in case of bounded variables. The basic idea is to carry out density estimation not on the original data but on appropriately transformed data. Then, the density for the original data can be obtained by a change of variables. Both the transformation parameters and the parameters of the Gaussian mixture are jointly estimated by the expectation-maximization (EM) algorithm. The methodology for partially and completely bounded data is illustrated using both simulated data and real data applications.

Keywords: EM algorithm; Gaussian mixture models; bounded support; density estimation; range-power transformation.

MeSH terms

  • Biometry / methods*
  • Data Analysis
  • Environmental Monitoring
  • Humans
  • Models, Statistical*
  • Neoplasms / blood
  • Neoplasms / epidemiology
  • Normal Distribution
  • Racial Groups / statistics & numerical data
  • Risk
  • Schools
  • Vitamin A / blood
  • beta Carotene / blood

Substances

  • beta Carotene
  • Vitamin A