Random sample consensus combined with partial least squares regression (RANSAC-PLS) for microbial metabolomics data mining and phenotype improvement

J Biosci Bioeng. 2016 Aug;122(2):168-75. doi: 10.1016/j.jbiosc.2016.01.007. Epub 2016 Feb 6.

Abstract

In recent years, the advent of high-throughput omics technology has made possible a new class of strain engineering approaches, based on identification of possible gene targets for phenotype improvement from omic-level comparison of different strains or growth conditions. Metabolomics, with its focus on the omic level closest to the phenotype, lends itself naturally to this semi-rational methodology. When a quantitative phenotype such as growth rate under stress is considered, regression modeling using multivariate techniques such as partial least squares (PLS) is often used to identify metabolites correlated with the target phenotype. However, linear modeling techniques such as PLS require a consistent metabolite-phenotype trend across the samples, which may not be the case when outliers or multiple conflicting trends are present in the data. To address this, we proposed a data-mining strategy that utilizes random sample consensus (RANSAC) to select subsets of samples with consistent trends for construction of better regression models. By applying a combination of RANSAC and PLS (RANSAC-PLS) to a dataset from a previous study (gas chromatography/mass spectrometry metabolomics data and 1-butanol tolerance of 19 yeast mutant strains), new metabolites were indicated to be correlated with tolerance within certain subsets of the samples. The relevance of these metabolites to 1-butanol tolerance were then validated from single-deletion strains of corresponding metabolic genes. The results showed that RANSAC-PLS is a promising strategy to identify unique metabolites that provide additional hints for phenotype improvement, which could not be detected by traditional PLS modeling using the entire dataset.

Keywords: 1-Butanol tolerance; Data mining; Gas chromatography/mass spectrometry; Metabolomics; Partial least squares; Phenotype improvement; Random sample consensus; Regression model; Saccharomyces cerevisiae; Semi-rational strain engineering.

Publication types

  • Validation Study

MeSH terms

  • 1-Butanol / pharmacology
  • Consensus Sequence
  • Data Mining*
  • Datasets as Topic
  • Gas Chromatography-Mass Spectrometry
  • Least-Squares Analysis*
  • Metabolomics*
  • Phenotype
  • Reproducibility of Results
  • Saccharomyces cerevisiae / classification
  • Saccharomyces cerevisiae / drug effects
  • Saccharomyces cerevisiae / genetics
  • Saccharomyces cerevisiae / metabolism*

Substances

  • 1-Butanol