pydca v1.0: a comprehensive software for direct coupling analysis of RNA and protein sequences

Mehari B Zerihun; Fabrizio Pucci; Emanuel K Peter; Alexander Schug

doi:10.1093/bioinformatics/btz892

pydca v1.0: a comprehensive software for direct coupling analysis of RNA and protein sequences

Bioinformatics. 2020 Apr 1;36(7):2264-2265. doi: 10.1093/bioinformatics/btz892.

Authors

Mehari B Zerihun^{1

2}, Fabrizio Pucci³, Emanuel K Peter³, Alexander Schug³

Affiliations

¹ Steinbuch Centre for Computing, Eggenstein-Leopoldshafen 76344.
² Department of Physics, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen 76344.
³ John von Neumann Institute for Computing, Jülich Supercomputer Centre, Forschungszentrum Jülich, Jülich 52428, Germany.

PMID: 31778142
DOI: 10.1093/bioinformatics/btz892

Abstract

Motivation: The ongoing advances in sequencing technologies have provided a massive increase in the availability of sequence data. This made it possible to study the patterns of correlated substitution between residues in families of homologous proteins or RNAs and to retrieve structural and stability information. Direct coupling analysis (DCA) infers coevolutionary couplings between pairs of residues indicating their spatial proximity, making such information a valuable input for subsequent structure prediction.

Results: Here, we present pydca, a standalone Python-based software package for the DCA of protein- and RNA-homologous families. It is based on two popular inverse statistical approaches, namely, the mean-field and the pseudo-likelihood maximization and is equipped with a series of functionalities that range from multiple sequence alignment trimming to contact map visualization. Thanks to its efficient implementation, features and user-friendly command line interface, pydca is a modular and easy-to-use tool that can be used by researchers with a wide range of backgrounds.

Availability and implementation: pydca can be obtained from https://github.com/KIT-MBS/pydca or from the Python Package Index under the MIT License.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

Amino Acid Sequence
Proteins
RNA*
Sequence Alignment
Software*

Substances

Proteins
RNA