Classification of Parkinson's disease from smartphone recording data using time-frequency analysis and convolutional neural network

Technol Health Care. 2023;31(2):705-718. doi: 10.3233/THC-220386.

Abstract

Background: Parkinson's disease (PD) is a long-term neurodegenerative disease of the central nervous system. The current diagnosis is dependent on clinical observation and the abilities and experience of a trained specialist. One of the symptoms that affects most patients is voice impairment.

Objective: Voice samples are non-invasive data that can be collected remotely for diagnosis and disease progression monitoring. In this study, we analyzed voice recording data from a smartphone as a possible medical self-diagnosis tool by using only one-second voice recording. The data from one of the largest mobile PD studies, the mPower study, was used.

Methods: A total of 29,798 ten-second voice recordings on smartphone from 4,051 participants were used for the analysis. The voice recordings were from sustained phonation by participants saying /aa/ for ten seconds into an iPhone microphone. A dataset comprising 385,143 short one-second audio samples was generated from the original ten-second voice recordings. The samples were converted to a spectrogram using a short-time Fourier transform. CNN models were then applied to classify the samples.

Results: Classification accuracies of the proposed method with LeNet-5, ResNet-50, and VGGNet-16 are 97.7 ± 0.1%, 98.6 ± 0.2%, and 99.3 ± 0.1%, respectively.

Conclusions: We achieve a respectable classification performance using a generalized approach on a dataset with a large number of samples. The result emphasizes that an analysis based on one-second clip recorded on a smartphone could be a promising non-invasive and remotely available PD biomarker.

Keywords: PD voice; audio classification; convolutional neural network; mPower study.

MeSH terms

  • Humans
  • Neural Networks, Computer
  • Neurodegenerative Diseases*
  • Parkinson Disease* / diagnosis
  • Smartphone
  • Voice*