MCWS-Transformers: Towards an Efficient Modeling of Protein Sequences via Multi Context-Window Based Scaled Self-Attention

Ashish Ranjan; Md Shah Fahad; David Fernandez-Baca; Sudhakar Tripathi; Akshay Deepak

doi:10.1109/TCBB.2022.3173789

MCWS-Transformers: Towards an Efficient Modeling of Protein Sequences via Multi Context-Window Based Scaled Self-Attention

IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1188-1199. doi: 10.1109/TCBB.2022.3173789. Epub 2023 Apr 3.

Authors

Ashish Ranjan, Md Shah Fahad, David Fernandez-Baca, Sudhakar Tripathi, Akshay Deepak

PMID: 35536815
DOI: 10.1109/TCBB.2022.3173789

Abstract

This paper advances the self-attention mechanism in the standard transformer network specific to the modeling of the protein sequences. We introduce a novel context-window based scaled self-attention mechanism for processing protein sequences that is based on the notion of (i) local context and (ii) large contextual pattern. Both notions are essential to building a good representation for protein sequences. The proposed context-window based scaled self-attention mechanism is further used to build the multi context-window based scaled (MCWS) transformer network for the protein function prediction task at the protein sub-sequence level. Overall, the proposed MCWS transformer network produced improved predictive performances, outperforming existing state-of-the-art approaches by substantial margins. With respect to the standard transformer network, the proposed network produced improvements in F1-score of +2.30% and +2.08% on the biological process (BP) and molecular function (MF) datasets, respectively. The corresponding improvements over the state-of-the-art ProtVecGen-Plus+ProtVecGen-Ensemble approach are +3.38% (BP) and +2.86% (MF). Equally important, robust performances were obtained across protein sequences of different lengths.

MeSH terms

Amino Acid Sequence*
Proteins* / chemistry
Software Design*

Substances

Proteins