Improving Gene Annotation of the Peanut Genome by Integrated Proteogenomics Workflow

Haifen Li; Ruo Zhou; Shaohang Xu; Xiaoping Chen; Yanbin Hong; Qing Lu; Hao Liu; Baojin Zhou; Xuanqiang Liang

doi:10.1021/acs.jproteome.9b00723

Improving Gene Annotation of the Peanut Genome by Integrated Proteogenomics Workflow

J Proteome Res. 2020 Jun 5;19(6):2226-2235. doi: 10.1021/acs.jproteome.9b00723. Epub 2020 May 15.

Authors

Haifen Li¹, Ruo Zhou², Shaohang Xu², Xiaoping Chen¹, Yanbin Hong¹, Qing Lu¹, Hao Liu¹, Baojin Zhou², Xuanqiang Liang¹

Affiliations

¹ Crops Research Institute, Guangdong Academy of Agricultural Sciences, Guangdong Key Laboratory for Crops Genetic Improvement, South China Peanut Sub-Center of National Center of Oilseed Crops Improvement, Guangzhou 510640, China.
² Deepxomics Co., Ltd., Shenzhen 518000, China.

PMID: 32367721
DOI: 10.1021/acs.jproteome.9b00723

Abstract

Peanut (Arachis hypogaea L.) is a staple crop in semiarid tropical and subtropical regions. Although the genome of peanut has been fully sequenced, the current gene annotations are still incomplete. New technologies in genomics and proteomics have resulted in the emergence of proteogenomics, which can integrate genomic, transcriptomic, and proteomic data for improving gene annotation. In the present study, we collected RNA-seq and proteomic data from multiple tissues such as seed, shell, and gynophore of peanut and utilized a proteogenomic approach to improve the gene annotation of peanut based on these data. A total of 1 935 655 904 RNA-seq reads and 7 490 280 MS/MS spectra were collected. Ultimately, 13 767 annotated genes were found with evidence at the protein level, and seven novel protein-coding genes were found with both RNA-seq and proteomics evidence. In addition, 35 gene models were updated based on proteomics data. Proteogenomic approaches improved the gene annotation in certain aspects by integrating both RNA-seq and proteomic data. We expect that these approaches could help improve existing genome annotations of other species.

Keywords: RNA-seq; bioinformatics; gene annotation; proteogenomics; proteomics.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Arachis / genetics
Molecular Sequence Annotation
Proteogenomics*
Proteomics
Tandem Mass Spectrometry
Workflow