Whole-Genome k-mer Topic Modeling AssociatesBacterial Families

Genes (Basel). 2020 Feb 14;11(2):197. doi: 10.3390/genes11020197.

Abstract

Alignment-free k-mer-based algorithms in whole genome sequence comparisons remainan ongoing challenge. Here, we explore the possibility to use Topic Modeling for organismwhole-genome comparisons. We analyzed 30 complete genomes from three bacterial families bytopic modeling. For this, each genome was considered as a document and 13-mer nucleotiderepresentations as words. Latent Dirichlet allocation was used as the probabilistic modeling of thecorpus. We where able to identify the topic distribution among analyzed genomes, which is highlyconsistent with traditional hierarchical classification. It is possible that topic modeling may be appliedto establish relationships between genome's composition and biological phenomena.

Keywords: Alignment-Free; Bacteria Genome Comparison; Topic Model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bacteria / classification*
  • Bacteria / genetics
  • Computational Biology / methods*
  • Genome, Bacterial
  • Genomics
  • Machine Learning
  • Models, Statistical
  • Phylogeny
  • Sequence Alignment
  • Whole Genome Sequencing / methods*