Critical evaluation of web-based prediction tools for human protein subcellular localization

Brief Bioinform. 2020 Sep 25;21(5):1628-1640. doi: 10.1093/bib/bbz106.

Abstract

Human protein subcellular localization has an important research value in biological processes, also in elucidating protein functions and identifying drug targets. Over the past decade, a number of protein subcellular localization prediction tools have been designed and made freely available online. The purpose of this paper is to summarize the progress of research on the subcellular localization of human proteins in recent years, including commonly used data sets proposed by the predecessors and the performance of all selected prediction tools against the same benchmark data set. We carry out a systematic evaluation of several publicly available subcellular localization prediction methods on various benchmark data sets. Among them, we find that mLASSO-Hum and pLoc-mHum provide a statistically significant improvement in performance, as measured by the value of accuracy, relative to the other methods. Meanwhile, we build a new data set using the latest version of Uniprot database and construct a new GO-based prediction method HumLoc-LBCI in this paper. Then, we test all selected prediction tools on the new data set. Finally, we discuss the possible development directions of human protein subcellular localization. Availability: The codes and data are available from http://www.lbci.cn/syn/.

Keywords: Gene Ontology terms; human proteins; multi-label classification; sequence information; subcellular localization; web server.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Benchmarking
  • Datasets as Topic
  • Humans
  • Internet*
  • Proteins / metabolism*
  • Subcellular Fractions / metabolism*

Substances

  • Proteins