AI-based disease category prediction model using symptoms from low-resource Ethiopian language: Afaan Oromo text

Sci Rep. 2024 May 16;14(1):11233. doi: 10.1038/s41598-024-62278-7.

Abstract

Automated disease diagnosis and prediction, powered by AI, play a crucial role in enabling medical professionals to deliver effective care to patients. While such predictive tools have been extensively explored in resource-rich languages like English, this manuscript focuses on predicting disease categories automatically from symptoms documented in the Afaan Oromo language, employing various classification algorithms. This study encompasses machine learning techniques such as support vector machines, random forests, logistic regression, and Naïve Bayes, as well as deep learning approaches including LSTM, GRU, and Bi-LSTM. Due to the unavailability of a standard corpus, we prepared three data sets with different numbers of patient symptoms arranged into 10 categories. The two feature representations, TF-IDF and word embedding, were employed. The performance of the proposed methodology has been evaluated using accuracy, recall, precision, and F1 score. The experimental results show that, among machine learning models, the SVM model using TF-IDF had the highest accuracy and F1 score of 94.7%, while the LSTM model using word2vec embedding showed an accuracy rate of 95.7% and F1 score of 96.0% from deep learning models. To enhance the optimal performance of each model, several hyper-parameter tuning settings were used. This study shows that the LSTM model verifies to be the best of all the other models over the entire dataset.

Keywords: Artificial intelligence; Classification; Deep learning; Health data; Machine learning.

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Bayes Theorem
  • Deep Learning*
  • Ethiopia
  • Humans
  • Language
  • Machine Learning
  • Support Vector Machine