Coverage and consistency: bioinformatics aspects of the analysis of multirun iTRAQ experiments with wheat leaves

Dana Pascovici; Donald M Gardiner; Xiaomin Song; Edmond Breen; Peter S Solomon; Tim Keighley; Mark P Molloy

doi:10.1021/pr400531y

Coverage and consistency: bioinformatics aspects of the analysis of multirun iTRAQ experiments with wheat leaves

J Proteome Res. 2013 Nov 1;12(11):4870-81. doi: 10.1021/pr400531y. Epub 2013 Sep 20.

Authors

Dana Pascovici¹, Donald M Gardiner, Xiaomin Song, Edmond Breen, Peter S Solomon, Tim Keighley, Mark P Molloy

Affiliation

¹ Australian Proteome Analysis Facility, Macquarie University , Sydney, NSW 2109, Australia.

PMID: 24015675
DOI: 10.1021/pr400531y

Abstract

The hexaploid genome of bread wheat (Triticum aestivum) is large (17 Gb) and repetitive, and this has delayed full sequencing and annotation of the genome, which is a prerequisite for effective quantitative proteomics analysis. Aware of these constraints we investigated the most effective approaches for shotgun proteomic analyses of bread wheat that would support large-scale quantitative comparisons using iTRAQ reagents. We used a data set that was generated by two-dimensional LC-MS of iTRAQ labeled peptides from wheat leaves. The main items considered in this study were the choice of sequence database for matching LC-MS data, the consistency of identification when multiple LC-MS runs were acquired, and the options for downstream functional analysis to generate useful insight. For peptide identification we examined the extensive NCBInr plant database, a smaller composite cereals database, the Brachypodium distachyon model plant genome, the EST-based SuperWheat database, as well as the genome sequence from the recently sequenced D-genome progenitor Aegilops tauschii. While the most spectra were assigned by using the SuperWheat database, this extremely large database could not be readily manipulated for the robust protein grouping that is required for large-scale, multirun quantitative experiments. We demonstrated a pragmatic alternative of using the composite cereals database for peptide spectra matching. The stochastic aspect of protein grouping across LC-MS runs was investigated using the smaller composite cereals database where we found that attaching the Brachypodium best BLAST hit reduced this problem. Further, assigning quantitation to the best Brachypodium locus yielded promising results enabling integration with existing downstream data mining and functional analysis tools. Our study demonstrated viable approaches for quantitative proteomics analysis of bread wheat samples and shows how these approaches could be similarly adopted for analysis of other organisms with unsequenced or incompletely sequenced genomes.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Chromatography, Liquid
Computational Biology / methods*
Databases, Genetic
Gene Ontology
Genome, Plant / genetics*
Mass Spectrometry
Plant Leaves / genetics*
Plant Leaves / metabolism
Plant Proteins / genetics*
Plant Proteins / metabolism
Proteomics / methods*
Triticum / genetics*
Triticum / metabolism

Substances

Plant Proteins