Explainable Machine Learning for COVID-19 Pneumonia Classification With Texture-Based Features Extraction in Chest Radiography

Luís Vinícius de Moura; Christian Mattjie; Caroline Machado Dartora; Rodrigo C Barros; Ana Maria Marques da Silva

doi:10.3389/fdgth.2021.662343

Explainable Machine Learning for COVID-19 Pneumonia Classification With Texture-Based Features Extraction in Chest Radiography

Front Digit Health. 2022 Jan 17:3:662343. doi: 10.3389/fdgth.2021.662343. eCollection 2021.

Authors

Luís Vinícius de Moura¹, Christian Mattjie^{1

2}, Caroline Machado Dartora^{1

2}, Rodrigo C Barros³, Ana Maria Marques da Silva^{1

2}

Affiliations

¹ Medical Image Computing Laboratory, School of Technology, Pontifical Catholic University of Rio Grande do Sul, PUCRS, Porto Alegre, Brazil.
² Graduate Program in Biomedical Gerontology, School of Medicine, Pontifical Catholic University of Rio Grande do Sul, PUCRS, Porto Alegre, Brazil.
³ Machine Learning Theory and Applications Lab, School of Technology, Pontifical Catholic University of Rio Grande do Sul, PUCRS, Porto Alegre, Brazil.

Abstract

Both reverse transcription-PCR (RT-PCR) and chest X-rays are used for the diagnosis of the coronavirus disease-2019 (COVID-19). However, COVID-19 pneumonia does not have a defined set of radiological findings. Our work aims to investigate radiomic features and classification models to differentiate chest X-ray images of COVID-19-based pneumonia and other types of lung patterns. The goal is to provide grounds for understanding the distinctive COVID-19 radiographic texture features using supervised ensemble machine learning methods based on trees through the interpretable Shapley Additive Explanations (SHAP) approach. We use 2,611 COVID-19 chest X-ray images and 2,611 non-COVID-19 chest X-rays. After segmenting the lung in three zones and laterally, a histogram normalization is applied, and radiomic features are extracted. SHAP recursive feature elimination with cross-validation is used to select features. Hyperparameter optimization of XGBoost and Random Forest ensemble tree models is applied using random search. The best classification model was XGBoost, with an accuracy of 0.82 and a sensitivity of 0.82. The explainable model showed the importance of the middle left and superior right lung zones in classifying COVID-19 pneumonia from other lung patterns.

Keywords: SHAP; X-rays; coronavirus; explainable models; machine learning; radiological findings; radiomics.