On Efficient Feature Ranking Methods for High-Throughput Data Analysis

IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1374-84. doi: 10.1109/TCBB.2015.2415790.

Abstract

Efficient mining of high-throughput data has become one of the popular themes in the big data era. Existing biology-related feature ranking methods mainly focus on statistical and annotation information. In this study, two efficient feature ranking methods are presented. Multi-target regression and graph embedding are incorporated in an optimization framework, and feature ranking is achieved by introducing structured sparsity norm. Unlike existing methods, the presented methods have two advantages: (1) the feature subset simultaneously account for global margin information as well as locality manifold information. Consequently, both global and locality information are considered. (2) Features are selected by batch rather than individually in the algorithm framework. Thus, the interactions between features are considered and the optimal feature subset can be guaranteed. In addition, this study presents a theoretical justification. Empirical experiments demonstrate the effectiveness and efficiency of the two algorithms in comparison with some state-of-the-art feature ranking methods through a set of real-world gene expression data sets.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Data Mining / methods*
  • Database Management Systems*
  • Databases, Factual*
  • Gene Expression Profiling / methods*
  • High-Throughput Screening Assays / methods*
  • Pattern Recognition, Automated
  • Statistics as Topic / methods