Machine Learning Models for Classifying High- and Low-Grade Gliomas: A Systematic Review and Quality of Reporting Analysis

Ryan C Bahar; Sara Merkaj; Gabriel I Cassinelli Petersen; Niklas Tillmanns; Harry Subramanian; Waverly Rose Brim; Tal Zeevi; Lawrence Staib; Eve Kazarian; MingDe Lin; Khaled Bousabarah; Anita J Huttner; Andrej Pala; Seyedmehdi Payabvash; Jana Ivanidze; Jin Cui; Ajay Malhotra; Mariam S Aboian

doi:10.3389/fonc.2022.856231

Machine Learning Models for Classifying High- and Low-Grade Gliomas: A Systematic Review and Quality of Reporting Analysis

Front Oncol. 2022 Apr 22:12:856231. doi: 10.3389/fonc.2022.856231. eCollection 2022.

Authors

Ryan C Bahar¹, Sara Merkaj^{1

2}, Gabriel I Cassinelli Petersen¹, Niklas Tillmanns¹, Harry Subramanian¹, Waverly Rose Brim¹, Tal Zeevi¹, Lawrence Staib¹, Eve Kazarian¹, MingDe Lin^{1

3}, Khaled Bousabarah⁴, Anita J Huttner⁵, Andrej Pala², Seyedmehdi Payabvash¹, Jana Ivanidze⁶, Jin Cui¹, Ajay Malhotra¹, Mariam S Aboian¹

Affiliations

¹ Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, United States.
² Department of Neurosurgery, University of Ulm, Ulm, Germany.
³ Visage Imaging, Inc., San Diego, CA, United States.
⁴ Visage Imaging, GmbH., Berlin, Germany.
⁵ Department of Pathology, Yale-New Haven Hospital, Yale School of Medicine, New Haven, CT, United States.
⁶ Department of Radiology, Weill Cornell Medicine, New York, NY, United States.

Abstract

Objectives: To systematically review, assess the reporting quality of, and discuss improvement opportunities for studies describing machine learning (ML) models for glioma grade prediction.

Methods: This study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy (PRISMA-DTA) statement. A systematic search was performed in September 2020, and repeated in January 2021, on four databases: Embase, Medline, CENTRAL, and Web of Science Core Collection. Publications were screened in Covidence, and reporting quality was measured against the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Statement. Descriptive statistics were calculated using GraphPad Prism 9.

Results: The search identified 11,727 candidate articles with 1,135 articles undergoing full text review and 85 included in analysis. 67 (79%) articles were published between 2018-2021. The mean prediction accuracy of the best performing model in each study was 0.89 ± 0.09. The most common algorithm for conventional machine learning studies was Support Vector Machine (mean accuracy: 0.90 ± 0.07) and for deep learning studies was Convolutional Neural Network (mean accuracy: 0.91 ± 0.10). Only one study used both a large training dataset (n>200) and external validation (accuracy: 0.72) for their model. The mean adherence rate to TRIPOD was 44.5% ± 11.1%, with poor reporting adherence for model performance (0%), abstracts (0%), and titles (0%).

Conclusions: The application of ML to glioma grade prediction has grown substantially, with ML model studies reporting high predictive accuracies but lacking essential metrics and characteristics for assessing model performance. Several domains, including generalizability and reproducibility, warrant further attention to enable translation into clinical practice.

Systematic review registration: PROSPERO, identifier CRD42020209938.

Keywords: artificial intelligence; deep learning; glioma; machine learning; systematic review.

Publication types

Systematic Review

Abstract

Publication types

Grants and funding