LPTD: a novel linear programming-based topology determination method for cryo-EM maps

Bioinformatics. 2022 May 13;38(10):2734-2741. doi: 10.1093/bioinformatics/btac170.

Abstract

Summary: Topology determination is one of the most important intermediate steps toward building the atomic structure of proteins from their medium-resolution cryo-electron microscopy (cryo-EM) map. The main goal in the topology determination is to identify correct matches (i.e. assignment and direction) between secondary structure elements (SSEs) (α-helices and β-sheets) detected in a protein sequence and cryo-EM density map. Despite many recent advances in molecular biology technologies, the problem remains a challenging issue. To overcome the problem, this article proposes a linear programming-based topology determination (LPTD) method to solve the secondary structure topology problem in three-dimensional geometrical space. Through modeling of the protein's sequence with the aid of extracting highly reliable features and a distance-based scoring function, the secondary structure matching problem is transformed into a complete weighted bipartite graph matching problem. Subsequently, an algorithm based on linear programming is developed as a decision-making strategy to extract the true topology (native topology) between all possible topologies. The proposed automatic framework is verified using 12 experimental and 15 simulated α-β proteins. Results demonstrate that LPTD is highly efficient and extremely fast in such a way that for 77% of cases in the dataset, the native topology has been detected in the first rank topology in <2 s. Besides, this method is able to successfully handle large complex proteins with as many as 65 SSEs. Such a large number of SSEs have never been solved with current tools/methods.

Availability and implementation: The LPTD package (source code and data) is publicly available at https://github.com/B-Behkamal/LPTD. Moreover, two test samples as well as the instruction of utilizing the graphical user interface have been provided in the shared readme file.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Cryoelectron Microscopy / methods
  • Models, Molecular
  • Programming, Linear*
  • Protein Conformation
  • Protein Structure, Secondary
  • Proteins* / chemistry

Substances

  • Proteins