Integrative analysis and variable selection with multiple high-dimensional data sets

Shuangge Ma; Jian Huang; Xiao Song

doi:10.1093/biostatistics/kxr004

Integrative analysis and variable selection with multiple high-dimensional data sets

Biostatistics. 2011 Oct;12(4):763-75. doi: 10.1093/biostatistics/kxr004. Epub 2011 Mar 16.

Authors

Shuangge Ma¹, Jian Huang, Xiao Song

Affiliation

¹ School of Public Health, Yale University, 60 College Street, New Haven, CT 06520, USA. shuangge.ma@yale.edu.

Abstract

In high-throughput -omics studies, markers identified from analysis of single data sets often suffer from a lack of reproducibility because of sample limitation. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple -omics data sets is challenging because of the high dimensionality of data and heterogeneity among studies. In this article, for marker selection in integrative analysis of data from multiple heterogeneous studies, we propose a 2-norm group bridge penalization approach. This approach can effectively identify markers with consistent effects across multiple studies and accommodate the heterogeneity among studies. We propose an efficient computational algorithm and establish the asymptotic consistency property. Simulations and applications in cancer profiling studies show satisfactory performance of the proposed approach.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Biomarkers, Tumor / genetics
Biostatistics
Carcinoma, Hepatocellular / genetics
Computer Simulation
Databases, Genetic / statistics & numerical data*
Gene Expression Profiling / statistics & numerical data*
Genetic Markers
Genetic Predisposition to Disease
Humans
Liver Neoplasms / genetics
Models, Genetic
Models, Statistical
Pancreatic Neoplasms / genetics

Substances

Biomarkers, Tumor
Genetic Markers

Abstract

Publication types

MeSH terms

Substances

Grants and funding