Towards enhanced metabolomic data analysis of mass spectrometry image: Multivariate Curve Resolution and Machine Learning

Anal Chim Acta. 2018 Dec 11:1037:211-219. doi: 10.1016/j.aca.2018.02.031. Epub 2018 Feb 20.

Abstract

Large amounts of data are generally produced from mass spectrometry imaging (MSI) experiments in obtaining the molecular and spatial information of biological samples. Traditionally, MS images are constructed using manually selected ions, and it is very challenging to comprehensively analyze MSI results due to their large data sizes and highly complex data structures. To overcome these barriers, it is obligatory to develop advanced data analysis approaches to handle the increasingly large MSI data. In the current study, we focused on the method development of using Multivariate Curve Resolution (MCR) and Machine Learning (ML) approaches. We aimed to effectively extract the essential information present in the large and complex MSI data and enhance the metabolomic data analysis of biological tissues. Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS) algorithm was used to obtain major patterns of spatial distribution and grouped metabolites with the same spatial distribution patterns. In addition, both supervised and unsupervised ML methods were established to analyze the MSI data. In the supervised ML approach, Random Forest method was selected, and the model was trained using the selected datasets based on the distribution pattern obtained from MCR-ALS analyses. In the unsupervised ML approach, both DBSCAN (Density-based Spatial Clustering of Applications with Noise) and CLARA (Clustering Large Applications) were applied to cluster the MSI datasets. It is worth noting that similar patterns of spatial distribution were discovered through MSI data analysis using MCR-ALS, supervised ML, and unsupervised ML. Our protocols of data analysis can be applied to process the data acquired using many other types of MSI techniques, and to extract the overall features present in MSI results that are intractable using traditional data analysis approaches.

Keywords: Machine learning; Mass spectrometry imaging; Metabolomic analysis; Multivariate curve resolution; t-SNE (t-distributed stochastic neighbor embedding).

MeSH terms

  • Algorithms
  • Animals
  • Kidney / diagnostic imaging
  • Machine Learning*
  • Mass Spectrometry
  • Metabolomics*
  • Mice
  • Multivariate Analysis