Binary Classification for Failure Risk Assessment

Methods Mol Biol. 2021:2194:77-105. doi: 10.1007/978-1-0716-0849-4_6.

Abstract

Survival analysis is tremendously powerful, and is a popular methodology for analyzing time to event models in bioinformatics. Furthermore, several of its extensions can simultaneously perform variable selection in conjunction with model estimation. While this flexibility is extremely desirable, under certain scenarios, binary class variable selection and classification methods might provide more reliable risk estimates. Synthetic simulations and real data case studies suggest that when (1) randomly censored points comprise only a small portion of data, (2) biological markers are weak, (3) it is desired to compute risk across predetermined time intervals, and (4) the assumptions of the competing time to event models are violated, binary class models tend to perform superior. In practice, it might be prudent to test both model families to guarantee adequate analysis. Here we describe the pipeline of binary class feature selection and classification for time to event risk assessment.

Keywords: Classification; Risk assessment; Survival analysis; Variable selection.

MeSH terms

  • Algorithms
  • Analysis of Variance
  • Biostatistics / methods*
  • Computational Biology / methods*
  • Computer Simulation
  • Data Interpretation, Statistical
  • Discriminant Analysis
  • Humans
  • Linear Models
  • Neoplasms / mortality*
  • Prognosis
  • Risk Assessment / methods
  • Support Vector Machine
  • Survival Analysis