Identification and characterization of constrained non-exonic bases lacking predictive epigenomic and transcription factor binding annotations

Nat Commun. 2020 Dec 2;11(1):6168. doi: 10.1038/s41467-020-19962-9.

Abstract

Annotations of evolutionary sequence constraint based on multi-species genome alignments and genome-wide maps of epigenomic marks and transcription factor binding provide important complementary information for understanding the human genome and genetic variation. Here we developed the Constrained Non-Exonic Predictor (CNEP) to quantify the evidence of each base in the genome being in an evolutionarily constrained non-exonic element from an input of over 60,000 epigenomic and transcription factor binding features. We find that the CNEP score outperforms baseline and related existing scores at predicting evolutionarily constrained non-exonic bases from such data. However, a subset of them are still not well predicted by CNEP. We developed a complementary Conservation Signature Score by CNEP (CSS-CNEP) that is predictive of those bases. We further characterize the nature of constrained non-exonic bases with low CNEP scores using additional types of information. CNEP and CSS-CNEP are resources for analyzing constrained non-exonic bases in the genome.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Base Sequence
  • Epigenesis, Genetic
  • Evolution, Molecular
  • Exons
  • Gene Ontology
  • Genome*
  • Humans
  • Introns*
  • Invertebrates / genetics*
  • Molecular Sequence Annotation
  • Protein Binding
  • Sequence Alignment
  • Sequence Homology, Nucleic Acid
  • Transcription Factors / genetics
  • Transcription Factors / metabolism*
  • Vertebrates / genetics*

Substances

  • Transcription Factors