Comparing the performance of statistical, machine learning, and deep learning algorithms to predict time-to-event: A simulation study for conversion to mild cognitive impairment

Martina Billichová; Lauren Joyce Coan; Silvester Czanner; Monika Kováčová; Fariba Sharifian; Gabriela Czanner

doi:10.1371/journal.pone.0297190

Comparing the performance of statistical, machine learning, and deep learning algorithms to predict time-to-event: A simulation study for conversion to mild cognitive impairment

PLoS One. 2024 Jan 22;19(1):e0297190. doi: 10.1371/journal.pone.0297190. eCollection 2024.

Authors

Martina Billichová¹, Lauren Joyce Coan², Silvester Czanner^{1

2}, Monika Kováčová¹, Fariba Sharifian², Gabriela Czanner^{1

2}

Affiliations

¹ Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava, Bratislava, Slovakia.
² School of Computer Science and Mathematics, Liverpool John Moores University, Liverpool, United Kingdom.

Abstract

Mild Cognitive Impairment (MCI) is a condition characterized by a decline in cognitive abilities, specifically in memory, language, and attention, that is beyond what is expected due to normal aging. Detection of MCI is crucial for providing appropriate interventions and slowing down the progression of dementia. There are several automated predictive algorithms for prediction using time-to-event data, but it is not clear which is best to predict the time to conversion to MCI. There is also confusion if algorithms with fewer training weights are less accurate. We compared three algorithms, from smaller to large numbers of training weights: a statistical predictive model (Cox proportional hazards model, CoxPH), a machine learning model (Random Survival Forest, RSF), and a deep learning model (DeepSurv). To compare the algorithms under different scenarios, we created a simulated dataset based on the Alzheimer NACC dataset. We found that the CoxPH model was among the best-performing models, in all simulated scenarios. In a larger sample size (n = 6,000), the deep learning algorithm (DeepSurv) exhibited comparable accuracy (73.1%) to the CoxPH model (73%). In the past, ignoring heterogeneity in the CoxPH model led to the conclusion that deep learning methods are superior. We found that when using the CoxPH model with heterogeneity, its accuracy is comparable to that of DeepSurv and RSF. Furthermore, when unobserved heterogeneity is present, such as missing features in the training, all three models showed a similar drop in accuracy. This simulation study suggests that in some applications an algorithm with a smaller number of training weights is not disadvantaged in terms of accuracy. Since algorithms with fewer weights are inherently easier to explain, this study can help artificial intelligence research develop a principled approach to comparing statistical, machine learning, and deep learning algorithms for time-to-event predictions.

Copyright: © 2024 Billichová et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Algorithms
Artificial Intelligence
Cognitive Dysfunction* / diagnosis
Deep Learning*
Humans
Machine Learning

Grants and funding

The third, fourth and sixth authors were supported in part by APVV-21-0448. The second and sixth authors were supported in part by QR PSF2021/2022 and PSF2022/2023. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. No authors received any salary from any of the mentioned funders.