On the reliability of deep learning-based classification for Alzheimer's disease: Multi-cohorts, multi-vendors, multi-protocols, and head-to-head validation

Yeong-Hun Song; Jun-Young Yi; Young Noh; Hyemin Jang; Sang Won Seo; Duk L Na; Joon-Kyung Seong

doi:10.3389/fnins.2022.851871

On the reliability of deep learning-based classification for Alzheimer's disease: Multi-cohorts, multi-vendors, multi-protocols, and head-to-head validation

Front Neurosci. 2022 Sep 7:16:851871. doi: 10.3389/fnins.2022.851871. eCollection 2022.

Authors

Yeong-Hun Song¹, Jun-Young Yi¹, Young Noh², Hyemin Jang³, Sang Won Seo³, Duk L Na³, Joon-Kyung Seong^{1

4

5}

Affiliations

¹ Department of Artificial Intelligence, Korea University, Seoul, South Korea.
² Department of Neurology, Gil Medical Center, Gachon University College of Medicine, Incheon, South Korea.
³ Department of Neurology, Samsung Medical Center, School of Medicine, Sungkyunkwan University, Seoul, South Korea.
⁴ School of Biomedical Engineering, Korea University, Seoul, South Korea.
⁵ Interdisciplinary Program in Precision Public Health, College of Health Science, Korea University, Seoul, South Korea.

Abstract

Structural changes in the brain due to Alzheimer's disease dementia (ADD) can be observed through brain T1-weighted magnetic resonance imaging (MRI) images. Many ADD diagnostic studies using brain MRI images have been conducted with machine-learning and deep-learning models. Although reliability is a key in clinical application and applicability of low-resolution MRI (LRMRI) is a key to broad clinical application, both are not sufficiently studied in the deep-learning area. In this study, we developed a 2-dimensional convolutional neural network-based classification model by adopting several methods, such as using instance normalization layer, Mixup, and sharpness aware minimization. To train the model, MRI images from 2,765 cognitively normal individuals and 1,192 patients with ADD from the Samsung medical center cohort were exploited. To assess the reliability of our classification model, we designed external validation in multiple scenarios: (1) multi-cohort validation using four additional cohort datasets including more than 30 different centers in multiple countries, (2) multi-vendor validation using three different MRI vendor subgroups, (3) LRMRI image validation, and finally, (4) head-to-head validation using ten pairs of MRI images from ten individual subjects scanned in two different centers. For multi-cohort validation, we used the MRI images from 739 subjects from the Alzheimer's Disease Neuroimaging Initiative cohort, 125 subjects from the Dementia Platform of Korea cohort, 234 subjects from the Premier cohort, and 139 subjects from the Gachon University Gil Medical Center. We further assessed classification performance across different vendors and protocols for each dataset. We achieved a mean AUC and classification accuracy of 0.9868 and 0.9482 in 5-fold cross-validation. In external validation, we obtained a comparable AUC of 0.9396 and classification accuracy of 0.8757 to other cross-validation studies in the ADNI cohorts. Furthermore, we observed the possibility of broad clinical application through LRMRI image validation by achieving a mean AUC and classification accuracy of 0.9404 and 0.8765 at cross-validation and AUC and classification accuracy of 0.8749 and 0.8281 at the ADNI cohort external validation.

Keywords: Alzheimer’s disease; clinical application; deep learning; low-resolution magnetic resonance imaging; multi-cohort validation; reliability; structural magnetic resonance imaging.