Radiologic-Radiomic Machine Learning Models for Differentiation of Benign and Malignant Solid Renal Masses: Comparison With Expert-Level Radiologists

AJR Am J Roentgenol. 2020 Jan;214(1):W44-W54. doi: 10.2214/AJR.19.21617. Epub 2019 Sep 25.

Abstract

OBJECTIVE. The objective of our study was to compare the performance of radiologicradiomic machine learning (ML) models and expert-level radiologists for differentiation of benign and malignant solid renal masses using contrast-enhanced CT examinations. MATERIALS AND METHODS. This retrospective study included a cohort of 254 renal cell carcinomas (RCCs) (190 clear cell RCCs [ccRCCs], 38 chromophobe RCCs [chrRCCs], and 26 papillary RCCs [pRCCs]), 26 fat-poor angioleiomyolipomas, and 10 oncocytomas with preoperative CT examinations. Lesions identified by four expert-level radiologists (> 3000 genitourinary CT and MRI studies) were manually segmented for radiologicradiomic analysis. Disease-specific support vector machine radiologic-radiomic ML models for classification of renal masses were trained and validated using a 10-fold cross-validation. Performance values for the expert-level radiologists and radiologic-radiomic ML models were compared using the McNemar test. RESULTS. The performance values for the four radiologists were as follows: sensitivity of 73.7-96.8% (median, 84.5%; variance, 122.7%) and specificity of 48.4-71.9% (median, 61.8%; variance, 161.6%) for differentiating ccRCCs from pRCCs and chrRCCs; sensitivity of 73.7-96.8% (median, 84.5%; variance, 122.7%) and specificity of 52.8-88.9% for differentiating ccRCCs from fat-poor angioleiomyolipomas and oncocytomas (median, 80.6%; variance, 269.1%); and sensitivity of 28.1-60.9% (median, 84.5%; variance, 122.7%) and specificity of 75.0-88.9% for differentiating pRCCs and chrRCCs from fat-poor angioleiomyolipomas and oncocytomas (median, 50.0%; variance, 191.1%). After a 10-fold cross-validation, the radiologic-radiomic ML model yielded the following performance values for differentiating ccRCCs from pRCCs and chrRCCs, ccRCCs from fat-poor angioleiomyolipomas and oncocytomas, and pRCCs and chrRCCs from fat-poor angioleiomyolipomas and oncocytomas: a sensitivity of 90.0%, 86.3%, and 73.4% and a specificity of 89.1%, 83.3%, and 91.7%, respectively. CONCLUSION. Expert-level radiologists had obviously large variances in performance for differentiating benign from malignant solid renal masses. Radiologic-radiomic ML can be a potential way to improve interreader concordance and performance.

Keywords: expert-level decision performance; fat-poor angioleiomyolipoma; machine learning; oncocytoma; radiomics; renal cell carcinoma.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Clinical Competence*
  • Diagnosis, Differential
  • Female
  • Humans
  • Kidney Diseases / diagnostic imaging*
  • Kidney Neoplasms / diagnostic imaging*
  • Machine Learning*
  • Magnetic Resonance Imaging*
  • Male
  • Middle Aged
  • Models, Theoretical*
  • Radiology*
  • Retrospective Studies
  • Tomography, X-Ray Computed*
  • Young Adult