Bioinformatics in proteomics: application, terminology, and pitfalls

Pathol Res Pract. 2004;200(2):173-8. doi: 10.1016/j.prp.2004.01.012.

Abstract

Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.

Publication types

  • Review

MeSH terms

  • Computational Biology* / methods
  • Decision Trees
  • Humans
  • Proteomics*
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization
  • Statistics as Topic
  • Terminology as Topic*