Feasibility as a mechanism for model identification and validation

J Appl Stat. 2020 Jun 29;48(11):2022-2041. doi: 10.1080/02664763.2020.1783522. eCollection 2021.

Abstract

As new technologies permit the generation of hitherto unprecedented volumes of data (e.g. genome-wide association study data), researchers struggle to keep up with the added complexity and time commitment required for its analysis. For this reason, model selection commonly relies on machine learning and data-reduction techniques, which tend to afford models with obscure interpretations. Even in cases with straightforward explanatory variables, the so-called 'best' model produced by a given model-selection technique may fail to capture information of vital importance to the domain-specific questions at hand. Herein we propose a new concept for model selection, feasibility, for use in identifying multiple models that are in some sense optimal and may unite to provide a wider range of information relevant to the topic of interest, including (but not limited to) interaction terms. We further provide an R package and associated Shiny Applications for use in identifying or validating feasible models, the performance of which we demonstrate on both simulated and real-life data.

Keywords: Data analysis; feasibility; model selection; model validation; regression; statistical model.

Grants and funding

This work was supported by the Kentucky Biomedical Research Infrastructure and INBRE National Institute of General Medical Sciences Grant [P20 RR16481]; and a National Multiple Sclerosis Society Pilot Grant [PP-1609-25975].