Prediction of model generalizability for unseen data: Methodology and case study in brain metastases detection in T1-Weighted contrast-enhanced 3D MRI

Comput Biol Med. 2023 Jun:159:106901. doi: 10.1016/j.compbiomed.2023.106901. Epub 2023 Apr 12.

Abstract

Background and purpose: A medical AI system's generalizability describes the continuity of its performance acquired from varying geographic, historical, and methodologic settings. Previous literature on this topic has mostly focused on "how" to achieve high generalizability (e.g., via larger datasets, transfer learning, data augmentation, model regularization schemes), with limited success. Instead, we aim to understand "when" the generalizability is achieved: Our study presents a medical AI system that could estimate its generalizability status for unseen data on-the-fly.

Materials and methods: We introduce a latent space mapping (LSM) approach utilizing Fréchet distance loss to force the underlying training data distribution into a multivariate normal distribution. During the deployment, a given test data's LSM distribution is processed to detect its deviation from the forced distribution; hence, the AI system could predict its generalizability status for any previously unseen data set. If low model generalizability is detected, then the user is informed by a warning message integrated into a sample deployment workflow. While the approach is applicable for most classification deep neural networks (DNNs), we demonstrate its application to a brain metastases (BM) detector for T1-weighted contrast-enhanced (T1c) 3D MRI. The BM detection model was trained using 175 T1c studies acquired internally (from the authors' institution) and tested using (1) 42 internally acquired exams and (2) 72 externally acquired exams from the publicly distributed Brain Mets dataset provided by the Stanford University School of Medicine. Generalizability scores, false positive (FP) rates, and sensitivities of the BM detector were computed for the test datasets.

Results and conclusion: The model predicted its generalizability to be low for 31% of the testing data (i.e., two of the internally and 33 of the externally acquired exams), where it produced (1) ∼13.5 false positives (FPs) at 76.1% BM detection sensitivity for the low and (2) ∼10.5 FPs at 89.2% BM detection sensitivity for the high generalizability groups respectively. These results suggest that the proposed formulation enables a model to predict its generalizability for unseen data.

Keywords: AI model generalizability; Brain metastases; Computer-aided detection; Latent space mapping; Magnetic resonance imaging.

MeSH terms

  • Brain Neoplasms* / diagnostic imaging
  • Brain Neoplasms* / secondary
  • Diagnosis, Computer-Assisted* / methods
  • Humans
  • Magnetic Resonance Imaging / methods
  • Neural Networks, Computer