Essential proteins discovery based on dominance relationship and neighborhood similarity centrality

Gaoshi Li; Xinlong Luo; Zhipeng Hu; Jingli Wu; Wei Peng; Jiafei Liu; Xiaoshu Zhu

doi:10.1007/s13755-023-00252-9

Essential proteins discovery based on dominance relationship and neighborhood similarity centrality

Health Inf Sci Syst. 2023 Nov 16;11(1):55. doi: 10.1007/s13755-023-00252-9. eCollection 2023 Dec.

Authors

Gaoshi Li^{1

2

3}, Xinlong Luo^{1

2

3}, Zhipeng Hu^{1

2

3}, Jingli Wu^{1

2

3}, Wei Peng⁴, Jiafei Liu^{1

2

3}, Xiaoshu Zhu^{1

2

3

5}

Affiliations

¹ Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China.
² Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China.
³ College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China.
⁴ Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China.
⁵ School of Computer and Information Security & School of Software Engineering, Guilin University of Electronic Science and Technology, Guilin, China.

PMID: 37981988
PMCID: PMC10654316 (available on 2024-12-01)
DOI: 10.1007/s13755-023-00252-9

Abstract

Essential proteins play a vital role in development and reproduction of cells. The identification of essential proteins helps to understand the basic survival of cells. Due to time-consuming, costly and inefficient with biological experimental methods for discovering essential proteins, computational methods have gained increasing attention. In the initial stage, essential proteins are mainly identified by the centralities based on protein-protein interaction (PPI) networks, which limit their identification rate due to many false positives in PPI networks. In this study, a purified PPI network is firstly introduced to reduce the impact of false positives in the PPI network. Secondly, by analyzing the similarity relationship between a protein and its neighbors in the PPI network, a new centrality called neighborhood similarity centrality (NSC) is proposed. Thirdly, based on the subcellular localization and orthologous data, the protein subcellular localization score and ortholog score are calculated, respectively. Fourthly, by analyzing a large number of methods based on multi-feature fusion, it is found that there is a special relationship among features, which is called dominance relationship, then, a novel model based on dominance relationship is proposed. Finally, NSC, subcellular localization score, and ortholog score are fused by the dominance relationship model, and a new method called NSO is proposed. In order to verify the performance of NSO, the seven representative methods (ION, NCCO, E_POC, SON, JDC, PeC, WDC) are compared on yeast datasets. The experimental results show that the NSO method has higher identification rate than other methods.

Keywords: Dominance relationship; Essential proteins; Multi-feature fusion; Neighborhood similarity centrality; Protein–protein interaction.

© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.