Classification of Single-Cell Gene Expression Trajectories from Incomplete and Noisy Data

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):193-207. doi: 10.1109/TCBB.2017.2763946. Epub 2017 Oct 17.

Abstract

This paper studies classification of gene-expression trajectories coming from two classes, healthy and mutated (cancerous) using Boolean networks with perturbation (BNps) to model the dynamics of each class at the state level. Each class has its own BNp, which is partially known based on gene pathways. We employ a Gaussian model at the observation level to show the expression values of the genes given the hidden binary states at each time point. We use expectation maximization (EM) to learn the BNps and the unknown model parameters, derive closed-form updates for the parameters, and propose a learning algorithm. After learning, a plug-in Bayes classifier is used to classify unlabeled trajectories, which can have missing data. Measuring gene expressions at different times yields trajectories only when measurements come from a single cell. In multiple-cell scenarios, the expression values are averages over many cells with possibly different states. Via the central-limit theorem, we propose another model for expression data in multiple-cell scenarios. Simulations demonstrate that single-cell trajectory data can outperform multiple-cell average expression data relative to classification error, especially in high-noise situations. We also consider data generated via a mammalian cell-cycle network, both the wild-type and with a common mutation affecting p27.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Animals
  • Bayes Theorem
  • Gene Expression Profiling / methods*
  • Gene Regulatory Networks / genetics*
  • Humans
  • Models, Genetic
  • Models, Statistical
  • Neoplasms / genetics
  • Neoplasms / metabolism
  • Single-Cell Analysis / methods*