Representing the dynamics of high-dimensional data with non-redundant wavelets

Shanshan Jia; Xingyi Li; Tiejun Huang; Jian K Liu; Zhaofei Yu

doi:10.1016/j.patter.2021.100424

Representing the dynamics of high-dimensional data with non-redundant wavelets

Patterns (N Y). 2022 Jan 6;3(3):100424. doi: 10.1016/j.patter.2021.100424. eCollection 2022 Mar 11.

Authors

Shanshan Jia^{1

2}, Xingyi Li³, Tiejun Huang^{1

2}, Jian K Liu⁴, Zhaofei Yu^{1

2}

Affiliations

¹ Institute for Artificial Intelligence, Peking University, Beijing 100871, China.
² Department of Computer Science and Technology, Peking University, Beijing 100871, China.
³ Center for Neurointelligence, School of Medicine, Chongqing University, Chongqing 400030, China.
⁴ School of Computing, University of Leeds, Leeds LS2 9JT, UK.

Abstract

A crucial question in data science is to extract meaningful information embedded in high-dimensional data into a low-dimensional set of features that can represent the original data at different levels. Wavelet analysis is a pervasive method for decomposing time-series signals into a few levels with detailed temporal resolution. However, obtained wavelets are intertwined and over-represented across levels for each sample and across different samples within one population. Here, using neuroscience data of simulated spikes, experimental spikes, calcium imaging signals, and human electrocorticography signals, we leveraged conditional mutual information between wavelets for feature selection. The meaningfulness of selected features was verified to decode stimulus or condition with high accuracy yet using only a small set of features. These results provide a new way of wavelet analysis for extracting essential features of the dynamics of spatiotemporal neural data, which then enables to support novel model design of machine learning with representative features.

Keywords: ECoG; calcium imaging; conditional information; dimensionality reduction; feature selection; mutual information; neural coding; neural spikes; wavelet analysis.