Comparative performances of machine learning algorithms in radiomics and impacting factors

Antoine Decoux; Loic Duron; Paul Habert; Victoire Roblot; Emina Arsovic; Guillaume Chassagnon; Armelle Arnoux; Laure Fournier

doi:10.1038/s41598-023-39738-7

Comparative performances of machine learning algorithms in radiomics and impacting factors

Sci Rep. 2023 Aug 28;13(1):14069. doi: 10.1038/s41598-023-39738-7.

Authors

Antoine Decoux^{1

2}, Loic Duron^{1

3}, Paul Habert^{1

4

5}, Victoire Roblot¹, Emina Arsovic¹, Guillaume Chassagnon⁶, Armelle Arnoux², Laure Fournier⁷

Affiliations

¹ Université Paris Cité, PARCC UMRS 970, INSERM, Paris, France.
² Unité de Recherche Clinique, Center d'Investigation Clinique 1418 Épidémiologie Clinique, Université Paris Cité, AP-HP, Hôpital Européen Georges Pompidou, INSERM, Paris, France.
³ Department of Radiology, Hôpital Fondation Ophtalmologique Adolphe de Rothschild, Paris, France.
⁴ Imaging Department, Hôpital Nord, APHM, Aix Marseille University, Marseille, France.
⁵ Aix Marseille Univ, LIIE, Marseille, France.
⁶ Department of Radiology, Université Paris Cité, AP-HP, Hôpital Cochin, Paris, France.
⁷ Department of Radiology, Université Paris Cité, AP-HP, Hôpital Européen Georges Pompidou, PARCC UMRS 970, INSERM, Paris, France. laure.fournier@aphp.fr.

Abstract

There are no current recommendations on which machine learning (ML) algorithms should be used in radiomics. The objective was to compare performances of ML algorithms in radiomics when applied to different clinical questions to determine whether some strategies could give the best and most stable performances regardless of datasets. This study compares the performances of nine feature selection algorithms combined with fourteen binary classification algorithms on ten datasets. These datasets included radiomics features and clinical diagnosis for binary clinical classifications including COVID-19 pneumonia or sarcopenia on CT, head and neck, orbital or uterine lesions on MRI. For each dataset, a train-test split was created. Each of the 126 (9 × 14) combinations of feature selection algorithms and classification algorithms was trained and tuned using a ten-fold cross validation, then AUC was computed. This procedure was repeated three times per dataset. Best overall performances were obtained with JMI and JMIM as feature selection algorithms and random forest and linear regression models as classification algorithms. The choice of the classification algorithm was the factor explaining most of the performance variation (10% of total variance). The choice of the feature selection algorithm explained only 2% of variation, while the train-test split explained 9%.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
COVID-19* / diagnostic imaging
Head
Humans
Machine Learning
Random Forest