Breaking Down Structural Diversity for Comprehensive Prediction of Ion-Neutral Collision Cross Sections

Anal Chem. 2020 Mar 17;92(6):4548-4557. doi: 10.1021/acs.analchem.9b05772. Epub 2020 Mar 6.

Abstract

Identification of unknowns is a bottleneck for large-scale untargeted analyses like metabolomics or drug metabolite identification. Ion mobility-mass spectrometry (IM-MS) provides rapid two-dimensional separation of ions based on their mobility through a neutral buffer gas. The mobility of an ion is related to its collision cross section (CCS) with the buffer gas, a physical property that is determined by the size and shape of the ion. This structural dependency makes CCS a promising characteristic for compound identification, but this utility is limited by the availability of high-quality reference CCS values. CCS prediction using machine learning (ML) has recently shown promise in the field, but accurate and broadly applicable models are still lacking. Here we present a novel ML approach that employs a comprehensive collection of CCS values covering a wide range of chemical space. Using this diverse database, we identified the structural characteristics, represented by molecular quantum numbers (MQNs), that contribute to variance in CCS and assessed the performance of a variety of ML algorithms in predicting CCS. We found that by breaking down the chemical structural diversity using unsupervised clustering based on the MQNs, specific and accurate prediction models for each cluster can be trained, which showed superior performance than a single model trained with all data. Using this approach, we have robustly trained and characterized a CCS prediction model with high accuracy on diverse chemical structures. An all-in-one web interface (https://CCSbase.net) was built for querying the CCS database and accessing the predictive model to support unknown compound identifications.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Databases, Factual
  • Ion Mobility Spectrometry
  • Ions / analysis
  • Machine Learning*
  • Particle Size
  • Surface Properties

Substances

  • Ions