Multichannel biomedical time series clustering via hierarchical probabilistic latent semantic analysis

Comput Methods Programs Biomed. 2014 Nov;117(2):238-46. doi: 10.1016/j.cmpb.2014.06.014. Epub 2014 Jun 28.

Abstract

Biomedical time series clustering that automatically groups a collection of time series according to their internal similarity is of importance for medical record management and inspection such as bio-signals archiving and retrieval. In this paper, a novel framework that automatically groups a set of unlabelled multichannel biomedical time series according to their internal structural similarity is proposed. Specifically, we treat a multichannel biomedical time series as a document and extract local segments from the time series as words. We extend a topic model, i.e., the Hierarchical probabilistic Latent Semantic Analysis (H-pLSA), which was originally developed for visual motion analysis to cluster a set of unlabelled multichannel time series. The H-pLSA models each channel of the multichannel time series using a local pLSA in the first layer. The topics learned in the local pLSA are then fed to a global pLSA in the second layer to discover the categories of multichannel time series. Experiments on a dataset extracted from multichannel Electrocardiography (ECG) signals demonstrate that the proposed method performs better than previous state-of-the-art approaches and is relatively robust to the variations of parameters including length of local segments and dictionary size. Although the experimental evaluation used the multichannel ECG signals in a biometric scenario, the proposed algorithm is a universal framework for multichannel biomedical time series clustering according to their structural similarity, which has many applications in biomedical time series management.

Keywords: Bag-of-words; ECG; PLSA; Topic model; Unsupervised learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Aged, 80 and over
  • Algorithms*
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Electrocardiography / methods*
  • Female
  • Heart Rate / physiology*
  • Humans
  • Male
  • Middle Aged
  • Models, Statistical*
  • Reproducibility of Results
  • Semantics
  • Sensitivity and Specificity
  • Young Adult