Algorithmic clustering based on string compression to extract P300 structure in EEG signals

Comput Methods Programs Biomed. 2019 Jul:176:225-235. doi: 10.1016/j.cmpb.2019.03.009. Epub 2019 Mar 15.

Abstract

Background and objectives: P300 is an Event Related Potential control signal widely used in Brain Computer Interfaces. Using the oddball paradigm, a P300 speller allows a human to spell letters through P300 events produced by his/her brain. One of the most common issues in the detection of this event is that its structure may differ between different subjects and over time for a specific subject. The main purpose of this work is to deal with this inherent variability and identify the main structure of P300 using algorithmic clustering based on string compression.

Methods: In this work, we make use of the Normalized Compression Distance (NCD) to extract the main structure of the signal regardless of its inherent variability. In order to apply compression distances, we carry out a novel signal-to-ASCII process that transforms and merges different events into suitable objects to be used by a compression algorithm. Once the ASCII objects are created, we use NCD-driven clustering as a tool to analyze if our object creation method suitably represents the information contained in the signals and to explore if compression distances are a valid tool for identifying P300 structure. With the purpose of increasing the level of generalization of our study, we apply two different clustering methods: a hierarchical clustering algorithm based on the minimum quartet tree method and a multidimensional projection method.

Results: Our experimental results show good clustering performance over different experiments, showing the structure extraction capabilities of our procedure. Two datasets with recordings in different scenarios were used to analyze the problem and validate our results, respectively. It has to be pointed out that when the clustering performance over individual electrodes is analyzed, higher P300 activity is found in similar regions to other articles using the same datasets. This suggests that our approach might be used as an electrode-selection criteria.

Conclusions: The proposed NCD-driven clustering methodology can be used to discover the structural characteristics of EEG and thereby, it is suitable as a complementary methodology for the P300 analysis.

Keywords: Brain computer interface; Clustering by compression; Data mining; Dendrogram; Kolmogorov complexity; Multidimensional projections; Normalized compression distance; Silhouette coefficient; Similarity.

MeSH terms

  • Algorithms
  • Brain / physiology*
  • Brain-Computer Interfaces*
  • Cluster Analysis*
  • Computer Simulation
  • Data Compression / methods*
  • Electrodes
  • Electroencephalography*
  • Event-Related Potentials, P300*
  • Humans