Hybrid machine learning approach for Arabic medical web page credibility assessment

Health Informatics J. 2022 Jan-Mar;28(1):14604582211070998. doi: 10.1177/14604582211070998.

Abstract

For many people, the Internet is their primary source of knowledge in today's modern world. Internet users frequently seek health-related information in order to better understand a health problem, seek guidance, or diagnose symptoms. Unfortunately, most of this information is inaccurate or unreliable, making it difficult for regular users to discern reliable sources of information. To determine online source reliability, specific knowledge and domain expertise are necessary. Researchers in health informatics studied a number of linguistic and non-linguistic indicators to assist ordinary individuals in judging medical web page credibility. This study proposes a method that automates the process of assessing the reliability of online medical sites based on textual and non-textual characteristics. To evaluate the proposed approach, we developed a real-world dataset of Arabic web pages that provide medical information. This dataset is the first Arabic medical web page dataset for content credibility evaluation. The hybrid approach was assessed using multiple machine learning and deep learning algorithms on the dataset, providing an accuracy and F1-score of 79% and 77%, respectively. We also identify the most observable patterns that help evaluate or detect unreliable web pages written in Arabic.

Keywords: content credibility; deep learning health websites; machine learning; natural language processing.

MeSH terms

  • Algorithms
  • Humans
  • Internet
  • Machine Learning*
  • Medical Informatics*
  • Reproducibility of Results