A set-theoretic definition of cell types with an algebraic structure on gene regulatory networks and application in annotation of RNA-seq data

Stem Cell Reports. 2023 Jan 10;18(1):113-130. doi: 10.1016/j.stemcr.2022.10.015. Epub 2022 Nov 17.

Abstract

The emergence of single-cell RNA sequencing (RNA-seq) has radically changed the observation of cellular diversity. Although annotations of RNA-seq data require preserved properties among cells of an identity, annotations using conventional methods have not been able to capture universal characters of a cell type. Analysis of expression levels cannot be accurately annotated for cells because differences in transcription do not necessarily explain biological characteristics in terms of cellular functions and because the data themselves do not inform about the correct mapping between cell types and genes. Hence, in this study, we developed a new representation of cellular identities that can be compared over different datasets while preserving nontrivial biological semantics. To generalize the notion of cell types, we developed a new framework to manage cellular identities in terms of set theory. We provided further insights into cells by installing mathematical descriptions of cell biology. We also performed experiments that could correspond to practical applications in annotations of RNA-seq data.

Keywords: annotation; cell type; cellular state; mathematical model; scRNA-seq; set theory; transcriptome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression Profiling / methods
  • Gene Regulatory Networks*
  • RNA* / genetics
  • RNA-Seq
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis / methods

Substances

  • RNA