A Novel In Silico Benchmarked Pipeline Capable of Complete Protein Analysis: A Possible Tool for Potential Drug Discovery

Biology (Basel). 2021 Oct 28;10(11):1113. doi: 10.3390/biology10111113.

Abstract

Current in silico proteomics require the trifecta analysis, namely, prediction, validation, and functional assessment of a modeled protein. The main drawback of this endeavor is the lack of a single protocol that utilizes a proper set of benchmarked open-source tools to predict a protein's structure and function accurately. The present study rectifies this drawback through the design and development of such a protocol. The protocol begins with the characterization of a novel coding sequence to identify the expressed protein. It then recognizes and isolates evolutionarily conserved sequence motifs through phylogenetics. The next step is to predict the protein's secondary structure, followed by the prediction, refinement, and validation of its three-dimensional tertiary structure. These steps enable the functional analysis of the macromolecule through protein docking, which facilitates the identification of the protein's active site. Each of these steps is crucial for the complete characterization of the protein under study. We have dubbed this process the trifecta analysis. In this study, we have proven the effectiveness of our protocol using the cystatin C and AChE proteins. Beginning with just their sequences, we have characterized both proteins' structures and functions, including identifying the cystatin C protein's seven-residue active site and the AChE protein's active-site gorge via protein-protein and protein-ligand docking, respectively. This process will greatly benefit new and experienced scientists alike in obtaining a strong understanding of the trifecta analysis, resulting in a domino effect that could expand drug development.

Keywords: protein modulation; therapeutic targets; trifecta analysis; virtual screening.