Retention Index Prediction Using Quantitative Structure-Retention Relationships for Improving Structure Identification in Nontargeted Metabolomics

Yabin Wen; Ruth I J Amos; Mohammad Talebi; Roman Szucs; John W Dolan; Christopher A Pohl; Paul R Haddad

doi:10.1021/acs.analchem.8b02084

Retention Index Prediction Using Quantitative Structure-Retention Relationships for Improving Structure Identification in Nontargeted Metabolomics

Anal Chem. 2018 Aug 7;90(15):9434-9440. doi: 10.1021/acs.analchem.8b02084. Epub 2018 Jul 10.

Authors

Yabin Wen¹, Ruth I J Amos¹, Mohammad Talebi¹, Roman Szucs², John W Dolan³, Christopher A Pohl⁴, Paul R Haddad¹

Affiliations

¹ Australian Centre for Research on Separation Science (ACROSS), School of Physical Sciences-Chemistry , University of Tasmania , Private Bag 75 , Hobart , 7001 Tasmania , Australia.
² Pfizer Global Research and Development , Sandwich CT139NJ , U.K.
³ LC Resources , McMinnville , Oregon 97128 , United States.
⁴ Thermo Fisher Scientific , Sunnyvale , California 94085 , United States.

PMID: 29952550
DOI: 10.1021/acs.analchem.8b02084

Abstract

Structure identification in nontargeted metabolomics based on liquid-chromatography coupled to mass spectrometry (LC-MS) remains a significant challenge. Quantitative structure-retention relationship (QSRR) modeling is a technique capable of accelerating the structure identification of metabolites by predicting their retention, allowing false positives to be eliminated during the interpretation of metabolomics data. In this work, 191 compounds were grouped according to molecular weight and a QSRR study was carried out on the 34 resulting groups to eliminate false positives. Partial least squares (PLS) regression combined with a Genetic algorithm (GA) was applied to construct the linear QSRR models based on a variety of VolSurf+ molecular descriptors. A novel dual-filtering approach, which combines Tanimoto similarity (TS) searching as the primary filter and retention index (RI) similarity clustering as the secondary filter, was utilized to select compounds in training sets to derive the QSRR models yielding R² of 0.8512 and an average root mean square error in prediction (RMSEP) of 8.45%. With a retention index filter expressed as ±2 standard deviations (SD) of the error, representative compounds were predicted with >91% accuracy, and for 53% of the groups (18/34), at least one false positive compound could be eliminated. The proposed strategy can thus narrow down the number of false positives to be assessed in nontargeted metabolomics.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Databases, Factual
Humans
Least-Squares Analysis
Linear Models
Metabolomics / methods*
Models, Biological
Quantitative Structure-Activity Relationship