New Approach for Generating Synthetic Medical Data to Predict Type 2 Diabetes

Bioengineering (Basel). 2023 Sep 1;10(9):1031. doi: 10.3390/bioengineering10091031.

Abstract

The lack of medical databases is currently the main barrier to the development of artificial intelligence-based algorithms in medicine. This issue can be partially resolved by developing a reliable high-quality synthetic database. In this study, an easy and reliable method for developing a synthetic medical database based only on statistical data is proposed. This method changes the primary database developed based on statistical data using a special shuffle algorithm to achieve a satisfactory result and evaluates the resulting dataset using a neural network. Using the proposed method, a database was developed to predict the risk of developing type 2 diabetes 5 years in advance. This dataset consisted of data from 172,290 patients. The prediction accuracy reached 94.45% during neural network training of the dataset.

Keywords: prediction of diseases; shuffling; synthetic medical data; type 2 diabetes.

Grants and funding

This study was funded by the Korea Agency for Technology and Standards in 2022, project numbers are K_G012002236201, K_G012002073401 and by the Gachon University research fund of 2023 (GCU-(202307790001)).