qtQDA: quantile transformed quadratic discriminant analysis for high-dimensional RNA-seq data

PeerJ. 2019 Dec 18:7:e8260. doi: 10.7717/peerj.8260. eCollection 2019.

Abstract

Classification on the basis of gene expression data derived from RNA-seq promises to become an important part of modern medicine. We propose a new classification method based on a model where the data is marginally negative binomial but dependent, thereby incorporating the dependence known to be present between measurements from different genes. The method, called qtQDA, works by first performing a quantile transformation (qt) then applying Gaussian quadratic discriminant analysis (QDA) using regularized covariance matrix estimates. We show that qtQDA has excellent performance when applied to real data sets and has advantages over some existing approaches. An R package implementing the method is also available on https://github.com/goknurginer/qtQDA.

Keywords: Classification; Dependent count data; Gene expression; Negative binomial distribution; Quadratic discriminant analysis; RNA-seq.

Grants and funding

This work was supported by the Scientific and Technical Research Council of Turkey (TUBITAK 2214/A—1059B141601270) and by the Australian National Health and Medical Research Council (Program Grant 1054618 and Fellowship 1154970 to Gordon K. Smyth), the Cancer Therapeutics CRC, Victorian State Government Operational Infrastructure Support and Australian Government NHMRC IRIIS. Funding for the article processing fee was provided by Smyth Lab funds. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.