Towards a chromatographic similarity index to establish localised quantitative structure-retention relationships for retention prediction. II Use of Tanimoto similarity index in ion chromatography

J Chromatogr A. 2017 Nov 10:1523:173-182. doi: 10.1016/j.chroma.2017.02.054. Epub 2017 Feb 24.

Abstract

Quantitative Structure-Retention Relationships (QSRR) are used to predict retention times of compounds based only on their chemical structures encoded by molecular descriptors. The main concern in QSRR modelling is to build models with high predictive power, allowing reliable retention prediction for the unknown compounds across the chromatographic space. With the aim of enhancing the prediction power of the models, in this work, our previously proposed QSRR modelling approach called "federation of local models" is extended in ion chromatography to predict retention times of unknown ions, where a local model for each target ion (unknown) is created using only structurally similar ions from the dataset. A Tanimoto similarity (TS) score was utilised as a measure of structural similarity and training sets were developed by including ions that were similar to the target ion, as defined by a threshold value. The prediction of retention parameters (a- and b-values) in the linear solvent strength (LSS) model in ion chromatography, log k=a - blog[eluent], allows the prediction of retention times under all eluent concentrations. The QSRR models for a- and b-values were developed by a genetic algorithm-partial least squares method using the retention data of inorganic and small organic anions and larger organic cations (molecular mass up to 507) on four Thermo Fisher Scientific columns (AS20, AS19, AS11HC and CS17). The corresponding predicted retention times were calculated by fitting the predicted a- and b-values of the models into the LSS model equation. The predicted retention times were also plotted against the experimental values to evaluate the goodness of fit and the predictive power of the models. The application of a TS threshold of 0.6 was found to successfully produce predictive and reliable QSRR models (Qext(F2)2>0.8 and Mean Absolute Error<0.1), and hence accurate retention time predictions with an average Mean Absolute Error of 0.2min.

Keywords: Genetic algorithm (GA); Ion chromatography; Linear solvent strength (LSS) model; Partial least squares (PLS); QSRR; Tanimoto similarity.

MeSH terms

  • Algorithms*
  • Anions
  • Bleeding Time
  • Chromatography / methods*
  • Least-Squares Analysis
  • Linear Models
  • Models, Theoretical*
  • Molecular Weight
  • Quantitative Structure-Activity Relationship
  • Solvents / chemistry

Substances

  • Anions
  • Solvents