Quantifying the distribution of feature values over data represented in arbitrary dimensional spaces

PLoS Comput Biol. 2024 Jan 4;20(1):e1011768. doi: 10.1371/journal.pcbi.1011768. eCollection 2024 Jan.

Abstract

Identifying the structured distribution (or lack thereof) of a given feature over a point cloud is a general research question. In the neuroscience field, this problem arises while investigating representations over neural manifolds (e.g., spatial coding), in the analysis of neurophysiological signals (e.g., sensory coding) or in anatomical image segmentation. We introduce the Structure Index (SI) as a directed graph-based metric to quantify the distribution of feature values projected over data in arbitrary D-dimensional spaces (defined from neurons, time stamps, pixels, genes, etc). The SI is defined from the overlapping distribution of data points sharing similar feature values in a given neighborhood of the cloud. Using arbitrary data clouds, we show how the SI provides quantification of the degree and directionality of the local versus global organization of feature distribution. SI can be applied to both scalar and vectorial features permitting quantification of the relative contribution of related variables. When applied to experimental studies of head-direction cells, it is able to retrieve consistent feature structure from both the high- and low-dimensional representations, and to disclose the local and global structure of the angle and speed represented in different brain regions. Finally, we provide two general-purpose examples (sound and image categorization), to illustrate the potential application to arbitrary dimensional spaces. Our method provides versatile applications in the neuroscience and data science fields.

MeSH terms

  • Algorithms*
  • Brain*

Grants and funding

This work is supported by a grant from Fundación La Caixa (LCF/PR/HR21/52410030; DeepCode) to LMP. JE received the support of a PhD fellowship from ”la Caixa” Foundation (ID 100010434; LCF/BQ/DR22/11950026). Access to supercomputer cluster Artemisa (project NeuroDIM) is co-funded by the European Union through the 2014-2020 FEDER Operative Programme of Comunitat Valenciana, project IDIFEDER/2018/048. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.