Optimism Bias Correction in Omics Studies with Big Data: Assessment of Penalized Methods on Simulated Data

OMICS. 2019 Apr;23(4):207-213. doi: 10.1089/omi.2018.0191. Epub 2019 Feb 22.

Abstract

Big Data generated by omics technologies require simultaneous analyses of large numbers of variables. This leads to complex model selection and parameter estimates that show optimism bias. This study on simulated data sets examined optimism-bias correction by penalty regression methods in case-control studies that involve clinical and omics variables. Least absolute shrinkage and selection operator (LASSO)-based methods (LASSO-penalized logistic regression, adaptive LASSO, and regularized LASSO for selection + ridge regression) were evaluated using power, the false positive rate (FPR), false discovery rate (FDR), and by estimated versus theoretical parameter comparisons. The "ordinary" LASSO overcorrects the optimism bias. The adaptive LASSO with LASSO estimation of the weights was unable to provide a sufficient correction. Importantly, the adaptive LASSO with ridge estimation of the weights showed the best parameter estimation. The regularized LASSO selection showed a slight optimism bias that decreased with the increase in the training set size. The optimism bias decreased with the increase of the number of variables selected among truly differentially expressed variables; however, power, FPR, and FDR were correlated. A compromise between model selection and estimation accuracy should be found. These results might prove useful because Big Data analyses are becoming commonplace in omics/multiomics studies in integrative biology, precision medicine, and planetary health.

Keywords: LASSO; optimism bias; parameter estimation; variable selection.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Case-Control Studies
  • Databases, Factual*
  • Humans
  • Logistic Models
  • Precision Medicine
  • Regression Analysis