A genetic algorithm for variable selection in logistic regression analysis of radiotherapy treatment outcomes

Med Phys. 2008 Dec;35(12):5426-33. doi: 10.1118/1.3005974.

Abstract

A given outcome of radiotherapy treatment can be modeled by analyzing its correlation with a combination of dosimetric, physiological, biological, and clinical factors, through a logistic regression fit of a large patient population. The quality of the fit is measured by the combination of the predictive power of this particular set of factors and the statistical significance of the individual factors in the model. We developed a genetic algorithm (GA), in which a small sample of all the possible combinations of variables are fitted to the patient data. New models are derived from the best models, through crossover and mutation operations, and are in turn fitted. The process is repeated until the sample converges to the combination of factors that best predicts the outcome. The GA was tested on a data set that investigated the incidence of lung injury in NSCLC patients treated with 3DCRT. The GA identified a model with two variables as the best predictor of radiation pneumonitis: the V30 (p=0.048) and the ongoing use of tobacco at the time of referral (p=0.074). This two-variable model was confirmed as the best model by analyzing all possible combinations of factors. In conclusion, genetic algorithms provide a reliable and fast way to select significant factors in logistic regression analysis of large clinical studies.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Area Under Curve
  • Carcinoma, Non-Small-Cell Lung / radiotherapy*
  • Humans
  • Lung Neoplasms / radiotherapy*
  • Models, Genetic
  • Models, Statistical
  • Models, Theoretical
  • Neoplasms / radiotherapy*
  • ROC Curve
  • Radiometry
  • Radiotherapy / methods*
  • Radiotherapy Planning, Computer-Assisted
  • Regression Analysis*
  • Treatment Outcome