Robust gene-environment interaction analysis using penalized trimmed regression

J Stat Comput Simul. 2018;88(18):3502-3528. doi: 10.1080/00949655.2018.1523411. Epub 2018 Sep 19.

Abstract

In biomedical and epidemiological studies, gene-environment (G-E) interactions have been shown to importantly contribute to the etiology and progression of many complex diseases. Most existing approaches for identifying G-E interactions are limited by the lack of robustness against outliers/contaminations in response and predictor spaces. In this study, we develop a novel robust G-E identification approach using the trimmed regression technique under joint modeling. A robust data-driven criterion and stability selection are adopted to determine the trimmed subset which is free from both vertical outliers and leverage points. An effective penalization approach is developed to identify important G-E interactions, respecting the "main effects, interactions" hierarchical structure. Extensive simulations demonstrate the better performance of the proposed approach compared to multiple alternatives. Interesting findings with superior prediction accuracy and stability are observed in the analysis of TCGA data on cutaneous melanoma and breast invasive carcinoma.

Keywords: G-E interaction; Penalized selection; Robustness; Trimmed regression.