DNA Copy Number Selection Using Robust Structured Sparsity-Inducing Norms

IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):168-81. doi: 10.1109/TCBB.2013.141.

Abstract

Array comparative genomic hybridization (aCGH) is a newly introduced method for the detection of copy number abnormalities associated with human diseases with special focus on cancer. Specific patterns in DNA copy number variations (CNVs) can be associated with certain disease types and can facilitate prognosis and progress monitoring of the disease. Machine learning techniques have been used to model the problem of tissue typing as a classification problem. Feature selection is an important part of the classification process, because many biological features are not related to the diseases and confuse the classification tasks. Multiple feature selection methods have been proposed in the different domains where classification has been applied. In this work, we will present a new feature selection method based on structured sparsity-inducing norms to identify the informative aCGH biomarkers which can help us classify different disease subtypes. To validate the performance of the proposed method, we experimentally compare it with existing feature selection methods on four publicly available aCGH data sets. In all empirical results, the proposed sparse learning based feature selection method consistently outperforms other related approaches. More important, we carefully investigate the aCGH biomarkers selected by our method, and the biological evidences in literature strongly support our results.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Biomarkers
  • Comparative Genomic Hybridization
  • DNA Copy Number Variations / genetics*
  • Genome, Human / genetics
  • Genomics / methods*
  • Humans
  • Male
  • Neoplasms / genetics
  • Reproducibility of Results

Substances

  • Biomarkers