Emergent statistical laws in single-cell transcriptomic data

Silvia Lazzardi; Filippo Valle; Andrea Mazzolini; Antonio Scialdone; Michele Caselle; Matteo Osella

doi:10.1103/PhysRevE.107.044403

Emergent statistical laws in single-cell transcriptomic data

Phys Rev E. 2023 Apr;107(4-1):044403. doi: 10.1103/PhysRevE.107.044403.

Authors

Silvia Lazzardi¹, Filippo Valle¹, Andrea Mazzolini², Antonio Scialdone³, Michele Caselle¹, Matteo Osella¹

Affiliations

¹ Department of Physics, University of Turin and INFN, via P. Giuria 1, 10125 Turin, Italy.
² Laboratoire de Physique de l'École Normale Supérieure (PSL University), CNRS, Sorbonne Université and Université de Paris, 75005 Paris, France.
³ Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München, Feodor-Lynen-Straße 21, 81377 München, Germany and Institute of Functional Epigenetics and Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany.

PMID: 37198814
DOI: 10.1103/PhysRevE.107.044403

Abstract

Large-scale data on single-cell gene expression have the potential to unravel the specific transcriptional programs of different cell types. The structure of these expression datasets suggests a similarity with several other complex systems that can be analogously described through the statistics of their basic building blocks. Transcriptomes of single cells are collections of messenger RNA abundances transcribed from a common set of genes just as books are different collections of words from a shared vocabulary, genomes of different species are specific compositions of genes belonging to evolutionary families, and ecological niches can be described by their species abundances. Following this analogy, we identify several emergent statistical laws in single-cell transcriptomic data closely similar to regularities found in linguistics, ecology, or genomics. A simple mathematical framework can be used to analyze the relations between different laws and the possible mechanisms behind their ubiquity. Importantly, treatable statistical models can be useful tools in transcriptomics to disentangle the actual biological variability from general statistical effects present in most component systems and from the consequences of the sampling process inherent to the experimental technique.

MeSH terms

Ecology
Ecosystem
Gene Expression Profiling*
Genomics / methods
Humans
Transcriptome*