PANDA: Protein function prediction using domain architecture and affinity propagation

Zheng Wang; Chenguang Zhao; Yiheng Wang; Zheng Sun; Nan Wang

doi:10.1038/s41598-018-21849-1

PANDA: Protein function prediction using domain architecture and affinity propagation

Sci Rep. 2018 Feb 22;8(1):3484. doi: 10.1038/s41598-018-21849-1.

Authors

Zheng Wang¹, Chenguang Zhao², Yiheng Wang², Zheng Sun³, Nan Wang⁴

Affiliations

¹ Department of Computer Science, University of Miami, 1364 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA. zheng.wang@miami.edu.
² School of Computing, University of Southern Mississippi, 118 College Drive #5106, Hattiesburg, MS, 39406, USA.
³ Department of Mathematics and Computer Science, The Citadel, 171 Moulrie Street, Charleston, SC, 29409, USA.
⁴ Department of Computer Science, New Jersey City University, 2039 Kennedy Blvd, Jersey City, NJ, 07305, USA.

Abstract

We developed PANDA (Propagation of Affinity and Domain Architecture) to predict protein functions in the format of Gene Ontology (GO) terms. PANDA at first executes profile-profile alignment algorithm to search against PfamA, KOG, COG, and SwissProt databases, and then launches PSI-BLAST against UniProt for homologue search. PANDA integrates a domain architecture inference algorithm based on the Bayesian statistics that calculates the probability of having a GO term. All the candidate GO terms are pooled and filtered based on Z-score. After that, the remaining GO terms are clustered using an affinity propagation algorithm based on the GO directed acyclic graph, followed by a second round of filtering on the clusters of GO terms. We benchmarked the performance of all the baseline predictors PANDA integrates and also for every pooling and filtering step of PANDA. It can be found that PANDA achieves better performances in terms of area under the curve for precision and recall compared to the baseline predictors. PANDA can be accessed from http://dna.cs.miami.edu/PANDA/ .

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Bayes Theorem
Computational Biology*
Databases, Protein
Gene Ontology
Protein Conformation*
Protein Domains / genetics
Proteins / chemistry*
Proteins / genetics
Software

Substances

Proteins

Grants and funding

R15 GM120650/GM/NIGMS NIH HHS/United States