Application of kernel principal component analysis and computational machine learning to exploration of metabolites strongly associated with diet

Sci Rep. 2018 Feb 21;8(1):3426. doi: 10.1038/s41598-018-20121-w.

Abstract

Computer-based technological innovation provides advancements in sophisticated and diverse analytical instruments, enabling massive amounts of data collection with relative ease. This is accompanied by a fast-growing demand for technological progress in data mining methods for analysis of big data derived from chemical and biological systems. From this perspective, use of a general "linear" multivariate analysis alone limits interpretations due to "non-linear" variations in metabolic data from living organisms. Here we describe a kernel principal component analysis (KPCA)-incorporated analytical approach for extracting useful information from metabolic profiling data. To overcome the limitation of important variable (metabolite) determinations, we incorporated a random forest conditional variable importance measure into our KPCA-based analytical approach to demonstrate the relative importance of metabolites. Using a market basket analysis, hippurate, the most important variable detected in the importance measure, was associated with high levels of some vitamins and minerals present in foods eaten the previous day, suggesting a relationship between increased hippurate and intake of a wide variety of vegetables and fruits. Therefore, the KPCA-incorporated analytical approach described herein enabled us to capture input-output responses, and should be useful not only for metabolic profiling but also for profiling in other areas of biological and environmental systems.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Mining
  • Diet*
  • Eating
  • Hippurates / metabolism
  • Humans
  • Machine Learning*
  • Metabolome*
  • Metabolomics / methods*
  • Principal Component Analysis*

Substances

  • Hippurates
  • hippuric acid