ETLD: an encoder-transformation layer-decoder architecture for protein contact and mutation effects prediction

He Wang; Yongjian Zang; Ying Kang; Jianwen Zhang; Lei Zhang; Shengli Zhang

doi:10.1093/bib/bbad290

ETLD: an encoder-transformation layer-decoder architecture for protein contact and mutation effects prediction

Brief Bioinform. 2023 Sep 20;24(5):bbad290. doi: 10.1093/bib/bbad290.

Authors

He Wang¹, Yongjian Zang¹, Ying Kang¹, Jianwen Zhang¹, Lei Zhang¹, Shengli Zhang¹

Affiliation

¹ MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an 710049, China.

PMID: 37598423
DOI: 10.1093/bib/bbad290

Abstract

The latent features extracted from the multiple sequence alignments (MSAs) of homologous protein families are useful for identifying residue-residue contacts, predicting mutation effects, shaping protein evolution, etc. Over the past three decades, a growing body of supervised and unsupervised machine learning methods have been applied to this field, yielding fruitful results. Here, we propose a novel self-supervised model, called encoder-transformation layer-decoder (ETLD) architecture, capable of capturing protein sequence latent features directly from MSAs. Compared to the typical autoencoder model, ETLD introduces a transformation layer with the ability to learn inter-site couplings, which can be used to parse out the two-dimensional residue-residue contacts map after a simple mathematical derivation or an additional supervised neural network. ETLD retains the process of encoding and decoding sequences, and the predicted probabilities of amino acids at each site can be further used to construct the mutation landscapes for mutation effects prediction, outperforming advanced models such as GEMME, DeepSequence and EVmutation in general. Overall, ETLD is a highly interpretable unsupervised model with great potential for improvement and can be further combined with supervised methods for more extensive and accurate predictions.

Keywords: contact prediction; encoder-transformation layer-decoder (ETLD) model; multiple sequence alignments (MSAs); mutation effects prediction; transformation matrix.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acids / genetics
Mutation
Neural Networks, Computer*
Proteins* / chemistry
Proteins* / genetics
Unsupervised Machine Learning

Substances

Proteins
Amino Acids