A Deep Learning-Based Tumor Classifier Directly Using MS Raw Data

Proteomics. 2020 Nov;20(21-22):e1900344. doi: 10.1002/pmic.201900344. Epub 2020 Jul 26.

Abstract

Since the launch of Chinese Human Proteome Project (CNHPP) and Clinical Proteomic Tumor Analysis Consortium (CPTAC), large-scale mass spectrometry (MS) based proteomic profiling of different kinds of human tumor samples have provided huge amount of valuable data for both basic and clinical researchers. Accurate prediction for tumor and non-tumor samples, as well as the tumor types has become a key step for biological and medical research, such as biomarker discovery, diagnosis, and monitoring of diseases. The traditional MS-based classification strategy mainly depends on the identification and quantification results of MS data, which has some inherent limitations, such as the low identification rate of MS data. Here, a deep learning-based tumor classifier directly using MS raw data is proposed, which is independent of the identification and quantification results of MS data. The potential precursors with intensities and retention times from MS data as input is first detected and extracted. Then, a deep learning-based classifier is trained, which can accurately distinguish between the tumor and non-tumor samples. Finally, it is demonstrated the deep learning-based classifier has a good performance compared with other machine learning methods and may help researchers find the potential biomarkers which are likely to be missed by the traditional strategy.

Keywords: MS data; deep learning; proteomics; tumor classifier.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Deep Learning*
  • Humans
  • Mass Spectrometry
  • Neoplasms*
  • Proteome
  • Proteomics*

Substances

  • Proteome