Differential Diagnosis of Pleural Effusion Using Machine Learning

Na Young Kim; Boa Jang; Kang-Mo Gu; Young Sik Park; Young-Gon Kim; Jaeyoung Cho

doi:10.1513/AnnalsATS.202305-410OC

Differential Diagnosis of Pleural Effusion Using Machine Learning

Ann Am Thorac Soc. 2024 Feb;21(2):211-217. doi: 10.1513/AnnalsATS.202305-410OC.

Authors

Na Young Kim¹, Boa Jang^{2

3}, Kang-Mo Gu⁴, Young Sik Park⁵, Young-Gon Kim^{2

6}, Jaeyoung Cho^{5

7}

Affiliations

¹ Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Internal Medicine, Hallym University Dongtan Sacred Heart Hospital, Hwaseong, Republic of Korea.
² Department of Transdisciplinary Medicine and.
³ Interdisciplinary Program in Bioengineering, College of Engineering, Seoul National University, Seoul, Republic of Korea.
⁴ Department of Internal Medicine, Chung-Ang University College of Medicine, Seoul, Republic of Korea.
⁵ Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea.
⁶ Department of Medicine and.
⁷ Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea.

PMID: 37788372
DOI: 10.1513/AnnalsATS.202305-410OC

Abstract

Rationale: Differential diagnosis of pleural effusion is challenging in clinical practice. Objectives: We aimed to develop a machine learning model to classify the five common causes of pleural effusions. Methods: This retrospective study collected 49 features from clinical information, blood, and pleural fluid of adult patients who underwent diagnostic thoracentesis between October 2013 and December 2018. Pleural effusions were classified into the following five categories: transudative, malignant, parapneumonic, tuberculous, and other. The performance of five different classifiers, including multinomial logistic regression, support vector machine, random forest, extreme gradient boosting, and light gradient boosting machine (LGB), was evaluated in terms of accuracy and area under the receiver operating characteristic curve through fivefold cross-validation. Hybrid feature selection was applied to determine the most relevant features for classifying pleural effusion. Results: We analyzed 2,253 patients (training set, n = 1,459; validation set, n = 365; extra-validation set, n = 429) and found that the LGB model achieved the best performance in both validation and extra-validation sets. After feature selection, the accuracy of the LGB model with the selected 18 features was equivalent to that with all 49 features (mean ± standard deviation): 0.818 ± 0.012 and 0.777 ± 0.007 in the validation and extra-validation sets, respectively. The model's mean area under the receiver operating characteristic curve was as high as 0.930 ± 0.042 and 0.916 ± 0.044 in the validation and extra-validation sets, respectively. In our model, pleural lactate dehydrogenase, protein, and adenosine deaminase levels were the most important factors for classifying pleural effusions. Conclusions: Our LGB model showed satisfactory performance for differential diagnosis of the common causes of pleural effusions. This model could provide clinicians with valuable information regarding the major differential diagnoses of pleural diseases.

Keywords: differential diagnosis; machine learning; pleural effusion.

MeSH terms

Adenosine Deaminase / metabolism
Adult
Diagnosis, Differential
Exudates and Transudates
Humans
Machine Learning
Pleural Effusion* / diagnosis
Pleural Effusion* / etiology
Retrospective Studies

Substances

Adenosine Deaminase

Grants and funding

04-2020-2360/Seoul National University Hospital Research Fund/United States