Classification of Parkinson's disease from smartphone recording data using time-frequency analysis and convolutional neural network

Denchai Worasawate; Warisara Asawaponwiput; Natsue Yoshimura; Apichart Intarapanich; Decho Surangsrirat

doi:10.3233/THC-220386

Classification of Parkinson's disease from smartphone recording data using time-frequency analysis and convolutional neural network

Technol Health Care. 2023;31(2):705-718. doi: 10.3233/THC-220386.

Authors

Denchai Worasawate¹, Warisara Asawaponwiput¹, Natsue Yoshimura², Apichart Intarapanich³, Decho Surangsrirat⁴

Affiliations

¹ Department of Electrical Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand.
² Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Japan.
³ Educational Technology Team, National Electronics and Computer Technology Center, Pathum Thani, Thailand.
⁴ Assistive Technology and Medical Devices Research Center, National Science and Technology Development Agency, Pathum Thani, Thailand.

PMID: 36155539
DOI: 10.3233/THC-220386

Abstract

Background: Parkinson's disease (PD) is a long-term neurodegenerative disease of the central nervous system. The current diagnosis is dependent on clinical observation and the abilities and experience of a trained specialist. One of the symptoms that affects most patients is voice impairment.

Objective: Voice samples are non-invasive data that can be collected remotely for diagnosis and disease progression monitoring. In this study, we analyzed voice recording data from a smartphone as a possible medical self-diagnosis tool by using only one-second voice recording. The data from one of the largest mobile PD studies, the mPower study, was used.

Methods: A total of 29,798 ten-second voice recordings on smartphone from 4,051 participants were used for the analysis. The voice recordings were from sustained phonation by participants saying /aa/ for ten seconds into an iPhone microphone. A dataset comprising 385,143 short one-second audio samples was generated from the original ten-second voice recordings. The samples were converted to a spectrogram using a short-time Fourier transform. CNN models were then applied to classify the samples.

Results: Classification accuracies of the proposed method with LeNet-5, ResNet-50, and VGGNet-16 are 97.7 ± 0.1%, 98.6 ± 0.2%, and 99.3 ± 0.1%, respectively.

Conclusions: We achieve a respectable classification performance using a generalized approach on a dataset with a large number of samples. The result emphasizes that an analysis based on one-second clip recorded on a smartphone could be a promising non-invasive and remotely available PD biomarker.

Keywords: PD voice; audio classification; convolutional neural network; mPower study.

MeSH terms

Humans
Neural Networks, Computer
Neurodegenerative Diseases*
Parkinson Disease* / diagnosis
Smartphone
Voice*