A Systematic Bioinformatics Approach to Identify High Quality Mass Spectrometry Data and Functionally Annotate Proteins and Proteomes

Methods Mol Biol. 2017:1549:163-176. doi: 10.1007/978-1-4939-6740-7_13.

Abstract

In the past decade, proteomics and mass spectrometry have taken tremendous strides forward, particularly in the life sciences, spurred on by rapid advances in technology resulting in generation and conglomeration of vast amounts of data. Though this has led to tremendous advancements in biology, the interpretation of the data poses serious challenges for many practitioners due to the immense size and complexity of the data. Furthermore, the lack of annotation means that a potential gold mine of relevant biological information may be hiding within this data. We present here a simple and intuitive workflow for the research community to investigate and mine this data, not only to extract relevant data but also to segregate usable, quality data to develop hypotheses for investigation and validation. We apply an MS evidence workflow for verifying peptides of proteins from one's own data as well as publicly available databases. We then integrate a suite of freely available bioinformatics analysis and annotation software tools to identify homologues and map putative functional signatures, gene ontology and biochemical pathways. We also provide an example of the functional annotation of missing proteins in human chromosome 7 data from the NeXtProt database, where no evidence is available at the proteomic, antibody, or structural levels. We give examples of protocols, tools and detailed flowcharts that can be extended or tailored to interpret and annotate the proteome of any novel organism.

Keywords: Functional annotation; MS evidence; MS validation; Missing proteins.

MeSH terms

  • Computational Biology / methods*
  • Databases, Protein
  • Mass Spectrometry* / methods
  • Mass Spectrometry* / standards
  • Molecular Sequence Annotation
  • Proteome*
  • Proteomics / methods*
  • Reproducibility of Results
  • Signal Transduction
  • Software*
  • Web Browser
  • Workflow

Substances

  • Proteome