Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage

Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.

Abstract

Purpose: There has been burgeoning interest in applying machine learning methods for predicting radiotherapy outcomes. However, the imbalanced ratio of a large number of variables to a limited sample size in radiation oncology constitutes a major challenge. Therefore, dimensionality reduction methods can be a key to success. The study investigates and contrasts the application of traditional machine learning methods and deep learning approaches for outcome modeling in radiotherapy. In particular, new joint architectures based on variational autoencoder (VAE) for dimensionality reduction are presented and their application is demonstrated for the prediction of lung radiation pneumonitis (RP) from a large-scale heterogeneous dataset.

Methods: A large-scale heterogeneous dataset containing a pool of 230 variables including clinical factors (e.g., dose, KPS, stage) and biomarkers (e.g., single nucleotide polymorphisms (SNPs), cytokines, and micro-RNAs) in a population of 106 nonsmall cell lung cancer (NSCLC) patients who received radiotherapy was used for modeling RP. Twenty-two patients had grade 2 or higher RP. Four methods were investigated, including feature selection (case A) and feature extraction (case B) with traditional machine learning methods, a VAE-MLP joint architecture (case C) with deep learning and lastly, the combination of feature selection and joint architecture (case D). For feature selection, Random forest (RF), Support Vector Machine (SVM), and multilayer perceptron (MLP) were implemented to select relevant features. Specifically, each method was run for multiple times to rank features within several cross-validated (CV) resampled sets. A collection of ranking lists were then aggregated by top 5% and Kemeny graph methods to identify the final ranking for prediction. A synthetic minority oversampling technique was applied to correct for class imbalance during this process. For deep learning, a VAE-MLP joint architecture where a VAE aimed for dimensionality reduction and an MLP aimed for classification was developed. In this architecture, reconstruction loss and prediction loss were combined into a single loss function to realize simultaneous training and weights were assigned to different classes to mitigate class imbalance. To evaluate the prediction performance and conduct comparisons, the area under receiver operating characteristic curves (AUCs) were performed for nested CVs for both handcrafted feature selections and the deep learning approach. The significance of differences in AUCs was assessed using the DeLong test of U-statistics.

Results: An MLP-based method using weight pruning (WP) feature selection yielded the best performance among the different hand-crafted feature selection methods (case A), reaching an AUC of 0.804 (95% CI: 0.761-0.823) with 29 top features. A VAE-MLP joint architecture (case C) achieved a comparable but slightly lower AUC of 0.781 (95% CI: 0.737-0.808) with the size of latent dimension being 2. The combination of handcrafted features (case A) and latent representation (case D) achieved a significant AUC improvement of 0.831 (95% CI: 0.805-0.863) with 22 features (P-value = 0.000642 compared with handcrafted features only (Case A) and P-value = 0.000453 compared to VAE alone (Case C)) with an MLP classifier.

Conclusion: The potential for combination of traditional machine learning methods and deep learning VAE techniques has been demonstrated for dealing with limited datasets in modeling radiotherapy toxicities. Specifically, latent variables from a VAE-MLP joint architecture are able to complement handcrafted features for the prediction of RP and improve prediction over either method alone.

Keywords: deep neural networks; feature selection; machine learning; radiotherapy outcome modeling.

MeSH terms

  • Carcinoma, Non-Small-Cell Lung / radiotherapy*
  • Computational Biology / methods*
  • Humans
  • Lung Neoplasms / radiotherapy*
  • Machine Learning*
  • Treatment Outcome