No-alignment-strategies for exploring a set of two-way data tables obtained from capillary electrophoresis-mass spectrometry

J Chromatogr A. 2008 May 23;1192(1):157-65. doi: 10.1016/j.chroma.2008.03.027. Epub 2008 Mar 15.

Abstract

Hyphenated techniques such as capillary electrophoresis-mass spectrometry (CE-MS) or high-performance liquid chromatography with diode array detection (HPLC-DAD), etc., are known to produce a huge amount of data since each sample is characterized by a two-way data table. In this paper different ways of obtaining sample-related information from a set of such tables are discussed. Working with original data requires alignment techniques due to time shifts caused by unavoidable variations in separation conditions. Other pre-processing techniques have been suggested to facilitate comparison among samples without prior peak alignment, for example, 'binning' and/or 'blurring' the data along the time dimension. All these techniques, however, require optimization of some parameters, and in this paper an alternative parameter-free method is proposed. The individual data tables (X) are represented as Gram matrices (XXT), where the summation is taken over the time dimension. Hence the possible variations in time scale are eliminated, while the time information is at least partly preserved by the correlation structure between the detection channels. For comparison among samples, a similarity matrix is constructed and explored by principal component analysis and hierarchical clustering. The Gram matrix approach was tested and compared to some other methods using 'binned' and 'blurred' data for a data set with CE-MS runs on urine samples. In addition to data exploration by principal component analysis and hierarchical clustering, a discriminant partial least squares model was constructed to discriminate between the samples that were taken with and without the prior intake of a drug. The result showed that the proposed method is at least as good as the others with respect to cluster identification and class prediction. A distinct advantage is that there is no need for parameter optimization, while a potential drawback is the large size of the Gram matrices for data with high mass resolution.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Interpretation, Statistical*
  • Electrophoresis, Capillary / methods*
  • Mass Spectrometry / methods*
  • Principal Component Analysis