Identification of differentially expressed genes by means of outlier detection

Itziar Irigoien; Concepción Arenas

doi:10.1186/s12859-018-2318-8

Identification of differentially expressed genes by means of outlier detection

BMC Bioinformatics. 2018 Sep 10;19(1):317. doi: 10.1186/s12859-018-2318-8.

Authors

Itziar Irigoien¹, Concepción Arenas²

Affiliations

¹ Department of Computation Science and Artificial Intelligence, University of the Basque Country UPV/EHU, Donostia, Spain.
² Department of Genetics, Microbiology and Statistics, University of Barcelona, Barcelona, Spain. carenas@ub.edu.

Abstract

Background: An important issue in microarray data is to select, from thousands of genes, a small number of informative differentially expressed (DE) genes which may be key elements for a disease. If each gene is analyzed individually, there is a big number of hypotheses to test and a multiple comparison correction method must be used. Consequently, the resulting cut-off value may be too small. Moreover, an important issue is the selection's replicability of the DE genes. We present a new method, called ORdensity, to obtain a reproducible selection of DE genes. It takes into account the relation between all genes and it is not a gene-by-gene approach, unlike the usually applied techniques to DE gene selection.

Results: The proposed method returns three measures, related to the concepts of outlier and density of false positives in a neighbourhood, which allow us to identify the DE genes with high classification accuracy. To assess the performance of ORdensity, we used simulated microarray data and four real microarray cancer data sets. The results indicated that the method correctly detects the DE genes; it is competitive with other well accepted methods; the list of DE genes that it obtains is useful for the correct classification or diagnosis of new future samples and, in general, it is more stable than other procedures.

Conclusions: ORdensity is a new method for identifying DE genes that avoids some of the shortcomings of the individual gene identification and it is stable when the original sample is changed by subsamples.

Keywords: Differentially expressed gene; Multivariate statistics; Outlier; Quantile.

MeSH terms

Biomarkers / metabolism*
Gene Expression Profiling / methods*
Humans
Neoplasms / genetics*
Oligonucleotide Array Sequence Analysis / methods

Substances

Biomarkers

Abstract

MeSH terms

Substances

Grants and funding