Early diagnosis and personalised treatment focusing on synthetic data modelling: Novel visual learning approach in healthcare

Ahsanullah Yunas Mahmoud; Daniel Neagu; Daniele Scrimieri; Amr Rashad Ahmed Abdullatif

doi:10.1016/j.compbiomed.2023.107295

Early diagnosis and personalised treatment focusing on synthetic data modelling: Novel visual learning approach in healthcare

Comput Biol Med. 2023 Sep:164:107295. doi: 10.1016/j.compbiomed.2023.107295. Epub 2023 Aug 2.

Authors

Ahsanullah Yunas Mahmoud¹, Daniel Neagu², Daniele Scrimieri², Amr Rashad Ahmed Abdullatif²

Affiliations

¹ Faculty of Engineering and Informatics, University of Bradford, Bradford, England, United Kingdom. Electronic address: A.Y.Mahmoud@bradford.ac.uk.
² Faculty of Engineering and Informatics, University of Bradford, Bradford, England, United Kingdom.

PMID: 37557053
DOI: 10.1016/j.compbiomed.2023.107295

Abstract

The early diagnosis and personalised treatment of diseases are facilitated by machine learning. The quality of data has an impact on diagnosis because medical data are usually sparse, imbalanced, and contain irrelevant attributes, resulting in suboptimal diagnosis. To address the impacts of data challenges, improve resource allocation, and achieve better health outcomes, a novel visual learning approach is proposed. This study contributes to the visual learning approach by determining whether less or more synthetic data are required to improve the quality of a dataset, such as the number of observations and features, according to the intended personalised treatment and early diagnosis. In addition, numerous visualisation experiments are conducted, including using statistical characteristics, cumulative sums, histograms, correlation matrix, root mean square error, and principal component analysis in order to visualise both original and synthetic data to address the data challenges. Real medical datasets for cancer, heart disease, diabetes, cryotherapy and immunotherapy are selected as case studies. As a benchmark and point of classification comparison in terms of such as accuracy, sensitivity, and specificity, several models are implemented such as k-Nearest Neighbours and Random Forest. To simulate algorithm implementation and data, Generative Adversarial Network is used to create and manipulate synthetic data, whilst, Random Forest is implemented to classify the data. An amendable and adaptable system is constructed by combining Generative Adversarial Network and Random Forest models. The system model presents working steps, overview and flowchart. Experiments reveal that the majority of data-enhancement scenarios allow for the application of visual learning in the first stage of data analysis as a novel approach. To achieve meaningful adaptable synergy between appropriate quality data and optimal classification performance while maintaining statistical characteristics, visual learning provides researchers and practitioners with practical human-in-the-loop machine learning visualisation tools. Prior to implementing algorithms, the visual learning approach can be used to actualise early, and personalised diagnosis. For the immunotherapy data, the Random Forest performed best with precision, recall, f-measure, accuracy, sensitivity, and specificity of 81%, 82%, 81%, 88%, 95%, and 60%, as opposed to 91%, 96%, 93%, 93%, 96%, and 73% for synthetic data, respectively. Future studies might examine the optimal strategies to balance the quantity and quality of medical data.

Keywords: Generative Adversarial Network; Healthcare; Imbalanced UCI data; Machine learning; Personalised and early diagnosis; Random Forest; Synthetic data; Visualisations.

MeSH terms

Algorithms
Delivery of Health Care
Early Detection of Cancer*
Humans
Machine Learning
Precision Medicine*