Hybrid machine learning approach for Arabic medical web page credibility assessment

Amal Alasmari; Areej Alhothali; Arwa Allinjawi

doi:10.1177/14604582211070998

Hybrid machine learning approach for Arabic medical web page credibility assessment

Health Informatics J. 2022 Jan-Mar;28(1):14604582211070998. doi: 10.1177/14604582211070998.

Authors

Amal Alasmari¹, Areej Alhothali¹, Arwa Allinjawi¹

Affiliation

¹ Computer Science Department, Faculty of Computing and Information Technology, 248371King Abdulaziz University, Jeddah, Saudi Arabia.

PMID: 35057651
DOI: 10.1177/14604582211070998

Abstract

For many people, the Internet is their primary source of knowledge in today's modern world. Internet users frequently seek health-related information in order to better understand a health problem, seek guidance, or diagnose symptoms. Unfortunately, most of this information is inaccurate or unreliable, making it difficult for regular users to discern reliable sources of information. To determine online source reliability, specific knowledge and domain expertise are necessary. Researchers in health informatics studied a number of linguistic and non-linguistic indicators to assist ordinary individuals in judging medical web page credibility. This study proposes a method that automates the process of assessing the reliability of online medical sites based on textual and non-textual characteristics. To evaluate the proposed approach, we developed a real-world dataset of Arabic web pages that provide medical information. This dataset is the first Arabic medical web page dataset for content credibility evaluation. The hybrid approach was assessed using multiple machine learning and deep learning algorithms on the dataset, providing an accuracy and F1-score of 79% and 77%, respectively. We also identify the most observable patterns that help evaluate or detect unreliable web pages written in Arabic.

Keywords: content credibility; deep learning health websites; machine learning; natural language processing.

MeSH terms

Algorithms
Humans
Internet
Machine Learning*
Medical Informatics*
Reproducibility of Results