Membrane Clustering of Coronavirus Variants Using Document Similarity

Genes (Basel). 2022 Oct 28;13(11):1966. doi: 10.3390/genes13111966.

Abstract

Currently, as an effect of the COVID-19 pandemic, bioinformatics, genomics, and biological computations are gaining increased attention. Genomes of viruses can be represented by character strings based on their nucleobases. Document similarity metrics can be applied to these strings to measure their similarities. Clustering algorithms can be applied to the results of their document similarities to cluster them. P systems or membrane systems are computation models inspired by the flow of information in the membrane cells. These can be used for various purposes, one of them being data clustering. This paper studies a novel and versatile clustering method for genomes and the utilization of such membrane clustering models using document similarity metrics, which is not yet a well-studied use of membrane clustering models.

Keywords: COVID; Doc2Vec; MinHash; P systems; bioinformatics; clustering; coronavirus; document similarity; genome; membrane computing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • COVID-19* / genetics
  • Cluster Analysis
  • Computational Biology / methods
  • Humans
  • Pandemics*

Grants and funding

This study was supported by the ÚNKP-21-3 New National Excellence Program of the Ministry for Innovation and Technology from the source of the National Research, Development and Innovation Fund. This research was also supported by grants of the “Application Domain Specific Highly Reliable IT Solutions” project that has been implemented with the support provided from the National Research, Development and Innovation Fund of Hungary, financed under the Thematic Excellence Programme TKP2020-NKA-06 (National Challenges Subprogramme) funding scheme.