Artificial Intelligence for Classification of Soft-Tissue Masses at US

Radiol Artif Intell. 2020 Dec 2;3(1):e200125. doi: 10.1148/ryai.2020200125. eCollection 2021 Jan.

Abstract

Purpose: To train convolutional neural network (CNN) models to classify benign and malignant soft-tissue masses at US and to differentiate three commonly observed benign masses.

Materials and methods: In this retrospective study, US images obtained between May 2010 and June 2019 from 419 patients (mean age, 52 years ± 18 [standard deviation]; 250 women) with histologic diagnosis confirmed at biopsy or surgical excision (n = 227) or masses that demonstrated imaging characteristics of lipoma, benign peripheral nerve sheath tumor, and vascular malformation (n = 192) were included. Images in patients with a histologic diagnosis (n = 227) were used to train and evaluate a CNN model to distinguish malignant and benign lesions. Twenty percent of cases were withheld as a test dataset, and the remaining cases were used to train the model with a 75%-25% training-validation split and fourfold cross-validation. Performance of the model was compared with retrospective interpretation of the same dataset by two experienced musculoskeletal radiologists, blinded to clinical history. A second group of US images from 275 of the 419 patients containing the three common benign masses was used to train and evaluate a separate model to differentiate between the masses. The models were trained on the Keras machine learning platform (version 2.3.1), with a modified pretrained VGG16 network. Performance metrics of the model and of the radiologists were compared by using the McNemar test, and 95% CIs for performance metrics were estimated by using the Clopper-Pearson method (accuracy, recall, specificity, and precision) and the DeLong method (area under the receiver operating characteristic curve).

Results: The model trained to classify malignant and benign masses demonstrated an accuracy of 79% (95% CI: 68, 88) on the test data, with an area under the receiver operating characteristic curve of 0.91 (95% CI: 0.84, 0.98), matching the performance of two expert readers. Performance of the model distinguishing three benign masses was lower, with an accuracy of 71% (95% CI: 61, 80) on the test data.

Conclusion: The trained CNN was capable of differentiating between benign and malignant soft-tissue masses depicted on US images, with performance matching that of two experienced musculoskeletal radiologists.© RSNA, 2020.