UniProtExtractR: an app and R package for easily extracting protein-specific UniProtKB information and fine-tuning organelle resolution

Bioinform Adv. 2023 Oct 31;3(1):vbad157. doi: 10.1093/bioadv/vbad157. eCollection 2023.

Abstract

Summary: UniProtKB is a publicly accessible database of annotated protein features for numerous organisms; however, globally extracting protein entry information for data visualization and categorization can be challenging. While the UniProtKB entry syntax maintains database consistency, it simultaneously obscures key terms within long character strings. To increase accessibility, UniProtExtractR is both an app and R package that extracts desired information across nine UniProtKB categories: DNA binding, Pathway, Transmembrane, Signal peptide, Protein families, Domain [FT], Motif, Involvement in disease, and Subcellular location [CC]. The app features interactive frequency tables that globally summarize both the original UniProtKB input query as well as the extracted/changed entry values. Moreover, UniProtExtractR includes a tractable mapping algorithm to define custom organelle-level resolution. UniProtExtractR exists as a freely accessible Shiny app that requires no coding experience as well as R package, the code of which is entirely open source.

Availability and implementation: UniProtExtractR source code and user manual, including example files and troubleshooting, is available at https://github.com/alex-bio/UniProtExtractR. The Shiny app is hosted at https://harperlab.connect.hms.harvard.edu/uniprotextractR.