Global Clustering Quality Coefficient Assessing the Efficiency of PCA Class Assignment

J Anal Methods Chem. 2014:2014:342497. doi: 10.1155/2014/342497. Epub 2014 Aug 26.

Abstract

An essential factor influencing the efficiency of the predictive models built with principal component analysis (PCA) is the quality of the data clustering revealed by the score plots. The sensitivity and selectivity of the class assignment are strongly influenced by the relative position of the clusters and by their dispersion. We are proposing a set of indicators inspired from analytical geometry that may be used for an objective quantitative assessment of the data clustering quality as well as a global clustering quality coefficient (GCQC) that is a measure of the overall predictive power of the PCA models. The use of these indicators for evaluating the efficiency of the PCA class assignment is illustrated by a comparative study performed for the identification of the preprocessing function that is generating the most efficient PCA system screening for amphetamines based on their GC-FTIR spectra. The GCQC ranking of the tested feature weights is explained based on estimated density distributions and validated by using quadratic discriminant analysis (QDA).