Systematic annotation of conservation states provides insights into regulatory regions in rice

J Genet Genomics. 2022 Dec;49(12):1127-1137. doi: 10.1016/j.jgg.2022.04.003. Epub 2022 Apr 22.

Abstract

Plant genomes contain a large fraction of noncoding sequences. The discovery and annotation of conserved noncoding sequences (CNSs) in plants is an ongoing challenge. Here we report the application of comparative genomics to systematically identify CNSs in 50 well-annotated Gramineae genomes using rice (Oryza sativa) as the reference. We conduct multiple-way whole-genome alignments to the rice genome. The rice genome is annotated as 20 conservation states (CSs) at single-nucleotide resolution using a multivariate hidden Markov model (ConsHMM) based on the multiple-genome alignments. Different states show distinct enrichments for various genomic features, and the conservation scores of CSs are highly correlated with the level of associated chromatin accessibility. We find that at least 33.5% of the rice genome is highly under selection, with more than 70% of the sequence lying outside of coding regions. A catalog of 855,366 regulatory CNSs is generated, and they significantly overlapped with putative active regulatory elements such as promoters, enhancers, and transcription factor binding sites. Collectively, our study provides a resource for elucidating functional noncoding regions of the rice genome and an evolutionary aspect of regulatory sequences in higher plants.

Keywords: Comparative genomics; Conservation states (CSs); Conserved noncoding sequences (CNSs); Rice.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Conserved Sequence / genetics
  • Genome, Plant / genetics
  • Genomics
  • Oryza* / genetics
  • Regulatory Sequences, Nucleic Acid / genetics