Exploring the Uncharacterized Human Proteome Using neXtProt

J Proteome Res. 2018 Dec 7;17(12):4211-4226. doi: 10.1021/acs.jproteome.8b00537. Epub 2018 Sep 20.

Abstract

20,230 protein-coding genes have been predicted from the analysis of the human genome (neXtProt release 2018-01-17), and about 10% of them are still lacking functional annotation, either predicted by bioinformatics tools or captured from experimental reports. A systematic exploration of the available literature on uncharacterized human genes/proteins led to proposal of functional annotations for 113 proteins and to consolidation of a list of 1,862 uncharacterized human proteins. The advanced search functionality of neXtProt was used extensively in order to examine the landscape of the uncharacterized human proteome in terms of subcellular locations, protein-protein interactions, tissue expression, association with diseases, and 3D structure. Finally, a deep data mining in various publicly available resources allowed building functional hypotheses for 26 uncharacterized human proteins validated at protein level (uPE1). These hypotheses cover the fields of cilia biology, male reproduction, metabolism, nervous system, immunity, inflammation, RNA metabolism, and chromatin biology. They will require experimental validation before they can be considered for annotation. Despite technological progresses, the pace of human protein characterization studies is still slow. It could be accelerated by a better integration of existing knowledge resources and by initiating large collaborative projects involving specialists of different biology fields. We hope that our analysis will contribute to set up the ground for such collaborative approaches and will be exploited by the HUPO Human Proteome Project teams committed to characterize uPE1 proteins.

Keywords: SPARQL; biocuration; cilium biology; data mining; functional annotation; human protein; knowledge base; neXtProt; systems biology.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Computational Biology
  • Data Mining
  • Genome, Human / genetics
  • Humans
  • Methods
  • Molecular Sequence Annotation*
  • Proteome / analysis
  • Proteome / genetics*

Substances

  • Proteome