Whole genome searching with shotgun proteomic data: applications for genome annotation

J Proteome Res. 2008 Jan;7(1):80-8. doi: 10.1021/pr070198n. Epub 2007 Dec 7.

Abstract

High-throughput genome sequencing continues to accelerate the rate at which complete genomes are available for biological research. Many of these new genome sequences have little or no genome annotation currently available and hence rely upon computational predictions of protein coding genes. Evidence of translation from proteomic techniques could facilitate experimental validation of protein coding genes, but the techniques for whole genome searching with MS/MS data have not been adequately developed to date. Here we describe GENQUEST, a novel method using peptide isoelectric focusing and accurate mass to greatly reduce the peptide search space, making fast, accurate, and sensitive whole human genome searching possible on common desktop computers. In an initial experiment, almost all exonic peptides identified in a protein database search were identified when searching genomic sequence. Many peptides identified exclusively in the genome searches were incorrectly identified or could not be experimentally validated, highlighting the importance of orthogonal validation. Experimentally validated peptides exclusive to the genomic searches can be used to reannotate protein coding genes. GENQUEST represents an experimental tool that can be used by the proteomics community at large for validating computational approaches to genome annotation.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cell Line, Tumor
  • Databases, Protein / trends*
  • Documentation / methods*
  • Genome, Human
  • Genomics / methods
  • Humans
  • Isoelectric Focusing
  • Proteomics / methods*
  • Tandem Mass Spectrometry / methods*