Comprehensive Bioinformatics Analysis of the Biodiversity of Lsm Proteins in the Archaea Domain

Microorganisms. 2023 May 3;11(5):1196. doi: 10.3390/microorganisms11051196.

Abstract

The Sm protein superfamily includes Sm, like-Sm (Lsm), and Hfq proteins. Sm and Lsm proteins are found in the Eukarya and Archaea domains, respectively, while Hfq proteins exist in the Bacteria domain. Even though Sm and Hfq proteins have been extensively studied, archaeal Lsm proteins still require further exploration. In this work, different bioinformatics tools are used to understand the diversity and distribution of 168 Lsm proteins in 109 archaeal species to increase the global understanding of these proteins. All 109 archaeal species analyzed encode one to three Lsm proteins in their genome. Lsm proteins can be classified into two groups based on molecular weight. Regarding the gene environment of lsm genes, many of these genes are located adjacent to transcriptional regulators of the Lrp/AsnC and MarR families, RNA-binding proteins, and ribosomal protein L37e. Notably, only proteins from species of the class Halobacteria conserved the internal and external residues of the RNA-binding site identified in Pyrococcus abyssi, despite belonging to different taxonomic orders. In most species, the Lsm genes show associations with 11 genes: rpl7ae, rpl37e, fusA, flpA, purF, rrp4, rrp41, hel308, rpoD, rpoH, and rpoN. We propose that most archaeal Lsm proteins are related to the RNA metabolism, and the larger Lsm proteins could perform different functions and/or act through other mechanisms of action.

Keywords: Lsm; RNA metabolism; archaea; bioinformatics analysis.