Blood Uric Acid Prediction With Machine Learning: Model Development and Performance Comparison

JMIR Med Inform. 2020 Oct 8;8(10):e18331. doi: 10.2196/18331.

Abstract

Background: Uric acid is associated with noncommunicable diseases such as cardiovascular diseases, chronic kidney disease, coronary artery disease, stroke, diabetes, metabolic syndrome, vascular dementia, and hypertension. Therefore, uric acid is considered to be a risk factor for the development of noncommunicable diseases. Most studies on uric acid have been performed in developed countries, and the application of machine-learning approaches in uric acid prediction in developing countries is rare. Different machine-learning algorithms will work differently on different types of data in various diseases; therefore, a different investigation is needed for different types of data to identify the most accurate algorithms. Specifically, no study has yet focused on the urban corporate population in Bangladesh, despite the high risk of developing noncommunicable diseases for this population.

Objective: The aim of this study was to develop a model for predicting blood uric acid values based on basic health checkup test results, dietary information, and sociodemographic characteristics using machine-learning algorithms. The prediction of health checkup test measurements can be very helpful to reduce health management costs.

Methods: Various machine-learning approaches were used in this study because clinical input data are not completely independent and exhibit complex interactions. Conventional statistical models have limitations to consider these complex interactions, whereas machine learning can consider all possible interactions among input data. We used boosted decision tree regression, decision forest regression, Bayesian linear regression, and linear regression to predict personalized blood uric acid based on basic health checkup test results, dietary information, and sociodemographic characteristics. We evaluated the performance of these five widely used machine-learning models using data collected from 271 employees in the Grameen Bank complex of Dhaka, Bangladesh.

Results: The mean uric acid level was 6.63 mg/dL, indicating a borderline result for the majority of the sample (normal range <7.0 mg/dL). Therefore, these individuals should be monitoring their uric acid regularly. The boosted decision tree regression model showed the best performance among the models tested based on the root mean squared error of 0.03, which is also better than that of any previously reported model.

Conclusions: A uric acid prediction model was developed based on personal characteristics, dietary information, and some basic health checkup measurements. This model will be useful for improving awareness among high-risk individuals and populations, which can help to save medical costs. A future study could include additional features (eg, work stress, daily physical activity, alcohol intake, eating red meat) in improving prediction.

Keywords: Bangladesh; blood uric acid; boosted decision tree regression model; machine learning; noncommunicable diseases; urban corporate population.