Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations

BMC Bioinformatics. 2017 Sep 13;18(Suppl 10):394. doi: 10.1186/s12859-017-1790-x.

Abstract

Background: Detecting patterns in high-dimensional multivariate datasets is non-trivial. Clustering and dimensionality reduction techniques often help in discerning inherent structures. In biological datasets such as microbial community composition or gene expression data, observations can be generated from a continuous process, often unknown. Estimating data points' 'natural ordering' and their corresponding uncertainties can help researchers draw insights about the mechanisms involved.

Results: We introduce a Bayesian Unidimensional Scaling (BUDS) technique which extracts dominant sources of variation in high dimensional datasets and produces their visual data summaries, facilitating the exploration of a hidden continuum. The method maps multivariate data points to latent one-dimensional coordinates along their underlying trajectory, and provides estimated uncertainty bounds. By statistically modeling dissimilarities and applying a DiSTATIS registration method to their posterior samples, we are able to incorporate visualizations of uncertainties in the estimated data trajectory across different regions using confidence contours for individual data points. We also illustrate the estimated overall data density across different areas by including density clouds. One-dimensional coordinates recovered by BUDS help researchers discover sample attributes or covariates that are factors driving the main variability in a dataset. We demonstrated usefulness and accuracy of BUDS on a set of published microbiome 16S and RNA-seq and roll call data.

Conclusions: Our method effectively recovers and visualizes natural orderings present in datasets. Automatic visualization tools for data exploration and analysis are available at: https://nlhuong.shinyapps.io/visTrajectory/ .

Keywords: Bayesian model; Dimensionality reduction; Microbiome; Ordering; Pseudotime; Single cell; Uncertainty.

MeSH terms

  • Animals
  • Anura / embryology
  • Anura / genetics
  • Bayes Theorem
  • Cluster Analysis
  • Databases as Topic*
  • Gastrointestinal Microbiome
  • Gene Expression Regulation, Developmental
  • Humans
  • Infant
  • Models, Statistical
  • Oceans and Seas
  • Uncertainty*
  • User-Computer Interface