Novelty detection for metabolic dynamics established on breast cancer tissue using 2D NMR TOCSY spectra

Comput Struct Biotechnol J. 2022 Jun 1:20:2965-2977. doi: 10.1016/j.csbj.2022.05.050. eCollection 2022.

Abstract

Most metabolic profiling approaches focus only on identifying pre-known metabolites on NMR TOCSY spectrum using configured parameters. However, there is a lack of tasks dealing with automating the detection of new metabolites that might appear during the dynamic evolution of biological cells. Novelty detection is a category of machine learning that is used to identify data that emerge during the test phase and were not considered during the training phase. We propose a novelty detection system for detecting novel metabolites in the 2D NMR TOCSY spectrum of a breast cancer-tissue sample. We build one- and multi-class recognition systems using different classifiers such as, Kernel Null Foley-Sammon Transform, Kernel Density Estimation, and Support Vector Data Description. The training models were constructed based on different sizes of training data and are used in the novelty detection procedure. Multiple evaluation measures were applied to test the performance of the novelty detection methods. Depending on the training data size, all classifiers were able to achieve 0% false positive rates and total misclassification error in addition to 100% true positive rates. The median total time for the novelty detection process varies between 1.5 and 20 seconds, depending on the classifier and the amount of training data. The results of our novel metabolic profiling method demonstrate its suitability, robustness and speed in automated metabolic research.

Keywords: 2D NMR TOCSY; ATP, Adenosine Triphosphate; AUC, Area under Curve; BMRB, Biological Magnetic Resonance Data Bank; Breast cancer; Chemometrics; Classification; HMDB, Human Metabolome Database; KDE, Kernel Density Estimation; KNFST, Kernel Null Foley–Sammon Transform; Machine learning; Metabolic profiling; Metabolomics; NMR, Nuclear Magnetic Resonance; Novelty detection; ROC, Receiver Operating Characteristic; SVDD, Support Vector Data Description; TOCSY, Total Correlation Spectroscopy.