Supervised self organizing maps for classification and determination of potentially discriminatory variables: illustrated by application to nuclear magnetic resonance metabolomic profiling

Anal Chem. 2010 Jan 15;82(2):628-38. doi: 10.1021/ac9020566.

Abstract

The article describes the extension of the self organizing maps discrimination index (SOMDI) for cases where there are more than two classes and more than one factor that may influence the group of samples by using supervised SOMs to determine which variables and how many are responsible for the different types of separation. The methods are illustrated by an application in the area of metabolic profiling, consisting of a nuclear magnetic resonance (NMR) data set of 96 samples of human saliva, which is characterized by three factors, namely, whether the sample has been treated or not, 16 donors, and 3 sampling days, differing for each donor. The sampling days can be considered a null factor as they should have no significant influence on the metabolic profile. Methods for supervised SOMs involve including a classifier for organizing the map, and we report a method for optimizing this by using an additional weight that determines the relative importance of the classifier relative to the overall experimental data set in order to avoid overfitting. Supervised SOMs can be obtained for each of the three factors, and we develop a multiclass SOM discrimination index (SOMDI) to determine which variables (or regions of the NMR spectra) are considered significant for each of the three potential factors. By dividing the data iteratively into training and test sets 100 times, we define variables as significant for a given factor if they have a positive SOMDI in the training set for the factor and class of interest over all iterations.