Optimal entropic properties of SARS-CoV-2 RNA sequences

R Soc Open Sci. 2024 Jan 31;11(1):231369. doi: 10.1098/rsos.231369. eCollection 2024 Jan.

Abstract

The reaction of the scientific community against the COVID-19 pandemic has generated a huge (approx. 106 entries) dataset of genome sequences collected worldwide and spanning a relatively short time window. These unprecedented conditions together with the certain identification of the reference viral genome sequence allow for an original statistical study of mutations in the virus genome. In this paper, we compute the Shannon entropy of every sequence in the dataset as well as the relative entropy and the mutual information between the reference sequence and the mutated ones. These functions, originally developed in information theory, measure the information content of a sequence and allows us to study the random character of mutation mechanism in terms of its entropy and information gain or loss. We show that this approach allows us to set in new format known features of the SARS-CoV-2 mutation mechanism like the CT bias, but also to discover new optimal entropic properties of the mutation process in the sense that the virus mutation mechanism track closely theoretically computable lower bounds for the entropy decrease and the information transfer.

Keywords: CT mutation bias; Shannon entropy; mutual information.

Associated data

  • figshare/10.6084/m9.figshare.c.7041519
  • Dryad/10.5061/dryad.9s4mw6mp2