A comparative study of different variable selection methods based on numerical simulation and empirical analysis

PeerJ Comput Sci. 2023 Aug 16:9:e1522. doi: 10.7717/peerj-cs.1522. eCollection 2023.

Abstract

This study employs the principles of computer science and statistics to evaluate the efficacy of the linear random effect model, utilizing Lasso variable selection techniques (including Lasso, Elastic-Net, Adaptive-Lasso, and SCAD) through numerical simulation and empirical research. The analysis focuses on the model's consistency in variable selection, prediction accuracy, stability, and efficiency. This study employs a novel approach to assess the consistency of variable selection across models. Specifically, the angle between the actual coefficient vector β and the estimated coefficient vector β ˆ is computed to determine the degree of consistency. Additionally, the boxplot tool of statistical analysis is utilized to visually represent the distribution of model prediction accuracy data and variable selection consistency. The comparative stability of each model is assessed based on the frequency of outliers. This study conducts comparative experiments of numerical simulation to evaluate a proposed model evaluation method against commonly used analysis methods. The results demonstrate the effectiveness and correctness of the proposed method, highlighting its ability to conveniently analyze the stability and efficiency of each fitting model.

Keywords: Boxplot; Coefficient consistency; Linear random effect model; Prediction accuracy; Stability; Variable selection.

Grants and funding

This research was supported by the Wenzhou Municipal Basic Scientific Research Project (R20220001), the Key Project of Philosophy and Social Science Research in Zhejiang Province (23NDJC045Z), the Wenzhou Philosophy and Social Science Planning Project (22WSK305), the Research project of Zhejiang Education Department (Y202249099), and the General Project of the National Social Science Foundation (22BJY126). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.