Classification of Long Noncoding RNAs by k-mer Content

Methods Mol Biol. 2021:2254:41-60. doi: 10.1007/978-1-0716-1158-6_4.

Abstract

K-mer based comparisons have emerged as powerful complements to BLAST-like alignment algorithms, particularly when the sequences being compared lack direct evolutionary relationships. In this chapter, we describe methods to compare k-mer content between groups of long noncoding RNAs (lncRNAs), to identify communities of lncRNAs with related k-mer contents, to identify the enrichment of protein-binding motifs in lncRNAs, and to scan for domains of related k-mer contents in lncRNAs. Our step-by-step instructions are complemented by Python code deposited in Github. Though our chapter focuses on lncRNAs, the methods we describe could be applied to any set of nucleic acid sequences.

Keywords: Communities; Domain; LncRNA; Long noncoding RNA; Louvain algorithm; Networks; Protein-binding motif; Sequence alignment; Unsupervised clustering; k-mer.

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Computational Biology / methods*
  • Nucleotide Motifs / genetics
  • Protein Binding
  • RNA, Long Noncoding / classification*
  • RNA, Long Noncoding / genetics*

Substances

  • RNA, Long Noncoding