New Approach for Generating Synthetic Medical Data to Predict Type 2 Diabetes

Zarnigor Tagmatova; Akmalbek Abdusalomov; Rashid Nasimov; Nigorakhon Nasimova; Ali Hikmet Dogru; Young-Im Cho

doi:10.3390/bioengineering10091031

New Approach for Generating Synthetic Medical Data to Predict Type 2 Diabetes

Bioengineering (Basel). 2023 Sep 1;10(9):1031. doi: 10.3390/bioengineering10091031.

Authors

Zarnigor Tagmatova¹, Akmalbek Abdusalomov¹, Rashid Nasimov², Nigorakhon Nasimova², Ali Hikmet Dogru³, Young-Im Cho¹

Affiliations

¹ Department of Computer Engineering, Gachon University, Sujeong-Gu, Seongnam-Si 461-701, Republic of Korea.
² Department of Artificial Intelligence, Tashkent State University of Economics, Tashkent 100066, Uzbekistan.
³ Department of Computer Science, University of Texas at San Antonio, San Antonio, TX 78249-0667, USA.

Abstract

The lack of medical databases is currently the main barrier to the development of artificial intelligence-based algorithms in medicine. This issue can be partially resolved by developing a reliable high-quality synthetic database. In this study, an easy and reliable method for developing a synthetic medical database based only on statistical data is proposed. This method changes the primary database developed based on statistical data using a special shuffle algorithm to achieve a satisfactory result and evaluates the resulting dataset using a neural network. Using the proposed method, a database was developed to predict the risk of developing type 2 diabetes 5 years in advance. This dataset consisted of data from 172,290 patients. The prediction accuracy reached 94.45% during neural network training of the dataset.

Keywords: prediction of diseases; shuffling; synthetic medical data; type 2 diabetes.

Grants and funding

This study was funded by the Korea Agency for Technology and Standards in 2022, project numbers are K_G012002236201, K_G012002073401 and by the Gachon University research fund of 2023 (GCU-(202307790001)).