Persistent Homology for RNA Data Analysis

Methods Mol Biol. 2023:2627:211-229. doi: 10.1007/978-1-0716-2974-1_12.

Abstract

Molecular representations are of great importance for machine learning models in RNA data analysis. Essentially, efficient molecular descriptors or fingerprints that characterize the intrinsic structural and interactional information of RNAs can significantly boost the performance of all learning modeling. In this paper, we introduce two persistent models, including persistent homology and persistent spectral, for RNA structure and interaction representations and their applications in RNA data analysis. Different from traditional geometric and graph representations, persistent homology is built on simplicial complex, which is a generalization of graph models to higher-dimensional situations. Hypergraph is a further generalization of simplicial complexes and hypergraph-based embedded persistent homology has been proposed recently. Moreover, persistent spectral models, which combine filtration process with spectral models, including spectral graph, spectral simplicial complex, and spectral hypergraph, are proposed for molecular representation. The persistent attributes for RNAs can be obtained from these two persistent models and further combined with machine learning models for RNA structure, flexibility, dynamics, and function analysis.

Keywords: Machine learning; Molecular representation; Persistent homology; Persistent spectral; RNA data analysis; Simplicial complex; Spectral simplicial complex.

MeSH terms

  • Data Analysis*
  • RNA* / genetics

Substances

  • RNA