Machine learning model for predicting malaria using clinical information

You Won Lee; Jae Woo Choi; Eun-Hee Shin

doi:10.1016/j.compbiomed.2020.104151

Machine learning model for predicting malaria using clinical information

Comput Biol Med. 2021 Feb:129:104151. doi: 10.1016/j.compbiomed.2020.104151. Epub 2020 Nov 28.

Authors

You Won Lee¹, Jae Woo Choi², Eun-Hee Shin³

Affiliations

¹ Department of Tropical Medicine and Parasitology, Seoul National University College of Medicine and Institute of Endemic Diseases, Seoul, 03080, Republic of Korea.
² Department of Pharmacology, Yonsei University College of Medicine, Seoul, 03722, Republic of Korea; Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, 03722, Republic of Korea.
³ Department of Tropical Medicine and Parasitology, Seoul National University College of Medicine and Institute of Endemic Diseases, Seoul, 03080, Republic of Korea; Seoul National University Bundang Hospital, Seongnam, 13620, Republic of Korea. Electronic address: ehshin@snu.ac.kr.

PMID: 33290932
DOI: 10.1016/j.compbiomed.2020.104151

Abstract

Background: Rapid diagnosing is crucial for controlling malaria. Various studies have aimed at developing machine learning models to diagnose malaria using blood smear images; however, this approach has many limitations. This study developed a machine learning model for malaria diagnosis using patient information.

Methods: To construct datasets, we extracted patient information from the PubMed abstracts from 1956 to 2019. We used two datasets: a solely parasitic disease dataset and total dataset by adding information about other diseases. We compared six machine learning models: support vector machine, random forest (RF), multilayered perceptron, AdaBoost, gradient boosting (GB), and CatBoost. In addition, a synthetic minority oversampling technique (SMOTE) was employed to address the data imbalance problem.

Results: Concerning the solely parasitic disease dataset, RF was found to be the best model regardless of using SMOTE. Concerning the total dataset, GB was found to be the best. However, after applying SMOTE, RF performed the best. Considering the imbalanced data, nationality was found to be the most important feature in malaria prediction. In case of the balanced data with SMOTE, the most important feature was symptom.

Conclusions: The results demonstrated that machine learning techniques can be successfully applied to predict malaria using patient information.

Keywords: Case reports; Diagnosis; Machine learning; Malaria; Patient information.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Humans
Machine Learning*
Malaria* / diagnosis
Neural Networks, Computer
Support Vector Machine