adabmDCA: adaptive Boltzmann machine learning for biological sequences

Anna Paola Muntoni; Andrea Pagnani; Martin Weigt; Francesco Zamponi

doi:10.1186/s12859-021-04441-9

adabmDCA: adaptive Boltzmann machine learning for biological sequences

BMC Bioinformatics. 2021 Oct 29;22(1):528. doi: 10.1186/s12859-021-04441-9.

Authors

Anna Paola Muntoni¹, Andrea Pagnani^{2

3

4}, Martin Weigt⁵, Francesco Zamponi⁶

Affiliations

¹ Statistical Inference and Biological Modeling Group, Italian Institute for Genomic Medicine, Candiolo, Italy. anna.muntoni@polito.it.
² Statistical Inference and Biological Modeling Group, Italian Institute for Genomic Medicine, Candiolo, Italy.
³ Department of Applied Science and Technology, Politecnico di Torino, Turin, Italy.
⁴ Sezione di Torino, Istituto Nazionale Fisica Nucleare, Turin, Italy.
⁵ Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, CNRS, Sorbonne Université, Paris, France.
⁶ Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, CNRS, Université PSL, Sorbonne Université, Université de Paris, Paris, France.

Abstract

Background: Boltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conservation, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy of these models has been also assessed in terms of their ability in predicting mutational effects and generating in silico functional sequences.

Results: Our adaptive implementation of Boltzmann machine learning, adabmDCA, can be generally applied to both protein and RNA families and accomplishes several learning set-ups, depending on the complexity of the input data and on the user requirements. The code is fully available at https://github.com/anna-pa-m/adabmDCA . As an example, we have performed the learning of three Boltzmann machines modeling the Kunitz and Beta-lactamase2 protein domains and TPP-riboswitch RNA domain.

Conclusions: The models learned by adabmDCA are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences. In addition, the code implements both equilibrium and out-of-equilibrium learning, which allows for an accurate and lossless training when the equilibrium one is prohibitive in terms of computational time, and allows for pruning irrelevant parameters using an information-based criterion.

Keywords: Boltzmann machine learning; Protein modelling; RNA modelling; Statistical inference.

adabmDCA: adaptive Boltzmann machine learning for biological sequences

Authors

Affiliations

Abstract

MeSH terms

Substances

Grants and funding