A robust rerank approach for feature selection and its application to pooling-based GWA studies

Jia-Rou Liu; Po-Hsiu Kuo; Hung Hung

doi:10.1155/2013/860673

A robust rerank approach for feature selection and its application to pooling-based GWA studies

Comput Math Methods Med. 2013:2013:860673. doi: 10.1155/2013/860673. Epub 2013 Apr 4.

Authors

Jia-Rou Liu¹, Po-Hsiu Kuo, Hung Hung

Affiliation

¹ Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan.

Abstract

Large-p-small-n datasets are commonly encountered in modern biomedical studies. To detect the difference between two groups, conventional methods would fail to apply due to the instability in estimating variances in t-test and a high proportion of tied values in AUC (area under the receiver operating characteristic curve) estimates. The significance analysis of microarrays (SAM) may also not be satisfactory, since its performance is sensitive to the tuning parameter, and its selection is not straightforward. In this work, we propose a robust rerank approach to overcome the above-mentioned diffculties. In particular, we obtain a rank-based statistic for each feature based on the concept of "rank-over-variable." Techniques of "random subset" and "rerank" are then iteratively applied to rank features, and the leading features will be selected for further studies. The proposed re-rank approach is especially applicable for large-p-small-n datasets. Moreover, it is insensitive to the selection of tuning parameters, which is an appealing property for practical implementation. Simulation studies and real data analysis of pooling-based genome wide association (GWA) studies demonstrate the usefulness of our method.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Area Under Curve
Bipolar Disorder / genetics
Computational Biology
Databases, Genetic / statistics & numerical data
Depressive Disorder, Major / genetics
Genetic Association Studies / statistics & numerical data
Genetic Markers
Genome-Wide Association Study / statistics & numerical data*
Humans
Models, Statistical
Polymorphism, Single Nucleotide
Statistics, Nonparametric

Substances

Genetic Markers