Using a Decision Tree Algorithm Predictive Model for Sperm Count Assessment and Risk Factors in Health Screening Population

Risk Manag Healthc Policy. 2023 Nov 16:16:2469-2478. doi: 10.2147/RMHP.S433193. eCollection 2023.

Abstract

Purpose: Approximately 20% of couples face infertility challenges and struggle to conceive naturally. Despite advances in artificial reproduction, its success hinges on sperm quality. Our previous study used five machine learning (ML) algorithms, random forest, stochastic gradient boosting, least absolute shrinkage and selection operator regression, ridge regression, and extreme gradient boosting, to model health data from 1375 Taiwanese males and identified ten risk factors affecting sperm count.

Methods: We employed the CART algorithm to generate decision trees using identified risk factors to predict healthy sperm counts. Four error metrics, SMAPE, RAE, RRSE, and RMSE, were used to evaluate the decision trees. We identified the top five decision trees based on their low errors and discussed in detail the tree with the least error.

Results: The decision tree featuring the least error, comprising BMI, UA, ST, T-Cho/HDL-C ratio, and BUN, corroborated the negative impacts of metabolic syndrome, particularly high BMI, on sperm count, while emphasizing the link between good sleep and male fertility. Our study also sheds light on the potentially significant influence of high BUN on spermatogenesis. Two novel risk factors, T-Cho/HDL-C and UA, warrant further investigation.

Conclusion: The ML algorithm established a predictive model for healthcare personnel to assess low sperm counts. Refinement of the model using additional data is crucial for improved precision. The risk factors identified offer avenues for future investigations.

Keywords: decision tree; food metabolite; metabolic syndrome; sleep time; sperm count.