Guide tree optimization with genetic algorithm to improve multiple protein 3D-structure alignment

Bioinformatics. 2022 Jan 27;38(4):985-989. doi: 10.1093/bioinformatics/btab798.

Abstract

Motivation: With the increasing availability of 3D-data, the focus of comparative bioinformatic analysis is shifting from protein sequence alignments toward more content-rich 3D-alignments. This raises the need for new ways to improve the accuracy of 3D-superimposition.

Results: We proposed guide tree optimization with genetic algorithm (GA) as a universal tool to improve the alignment quality of multiple protein 3D-structures systematically. As a proof of concept, we implemented the suggested GA-based approach in popular Matt and Caretta multiple protein 3D-structure alignment (M3DSA) algorithms, leading to a statistically significant improvement of the TM-score quality indicator by up to 220-1523% on 'SABmark Superfamilies' (in 49-77% of cases) and 'SABmark Twilight' (in 59-80% of cases) datasets. The observed improvement in collections of distant homologies highlights the potentials of GA to optimize 3D-alignments of diverse protein superfamilies as one plausible tool to study the structure-function relationship.

Availability and implementation: The source codes of patched gaCaretta and gaMatt programs are available open-access at https://github.com/n-canter/gamaps.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Proteins* / chemistry
  • Sequence Alignment
  • Software*

Substances

  • Proteins