Extended Similarity Methods for Efficient Data Mining in Imaging Mass Spectrometry

Nicholas R Ellin; Ramón Alain Miranda-Quintana; Boone M Prentice

doi:10.1101/2023.07.27.550838

Extended Similarity Methods for Efficient Data Mining in Imaging Mass Spectrometry

bioRxiv [Preprint]. 2023 Jul 30:2023.07.27.550838. doi: 10.1101/2023.07.27.550838.

Authors

Nicholas R Ellin¹, Ramón Alain Miranda-Quintana^{1

2}, Boone M Prentice¹

Affiliations

¹ Department of Chemistry, University of Florida, Gainesville, FL, 32611-7200; USA.
² Quantum Theory Project, University of Florida, Gainesville, FL, 32611-7200; USA.

Abstract

Imaging mass spectrometry is a label-free imaging modality that allows for the spatial mapping of many compounds directly in tissues. In an imaging mass spectrometry experiment, a raster of the tissue surface produces a mass spectrum at each sampled $x$ , $y$ position, resulting in thousands of individual mass spectra, each comprising a pixel in the resulting ion images. However, efficient analysis of imaging mass spectrometry datasets can be challenging due to the hyperspectral characteristics of the data. Each spectrum contains several thousand unique compounds at discrete m/z values that result in unique ion images, which demands robust and efficient algorithms for searching, statistical analysis, and visualization. Some traditional post-processing techniques are fundamentally ill-equipped to dissect these types of data. For example, while principal component analysis (PCA) has long served as a useful tool for mining imaging mass spectrometry datasets to identify correlated analytes and biological regions of interest, the interpretation of the PCA scores and loadings can be non-trivial. The loadings often containing negative peaks in the PCA-derived pseudo-spectra, which are difficult to ascribe to underlying tissue biology. Herein, we have utilized extended similarity indices to streamline the interpretation of imaging mass spectrometry data. This novel workflow uses PCA as a pixel-selection method to parse out the most and least correlated pixels, which are then compared using the extended similarity indices. The extended similarity indices complement PCA by removing all non-physical artifacts and streamlining the interpretation of large volumes of IMS spectra simultaneously. The linear complexity, $O (N)$ , of these indices suggests that large imaging mass spectrometry datasets can be analyzed in a 1:1 scale of time and space with respect to the size of the input data. The extended similarity indices algorithmic workflow is exemplified here by identifying discrete biological regions of mouse brain tissue.

Publication types

Preprint

Grants and funding

R01 GM138660/GM/NIGMS NIH HHS/United States