Prioritizing cardiovascular disease-associated variants altering NKX2-5 and TBX5 binding through an integrative computational approach

J Biol Chem. 2023 Dec;299(12):105423. doi: 10.1016/j.jbc.2023.105423. Epub 2023 Nov 4.

Abstract

Cardiovascular diseases (CVDs) are the leading cause of death worldwide and are heavily influenced by genetic factors. Genome-wide association studies have mapped >90% of CVD-associated variants within the noncoding genome, which can alter the function of regulatory proteins, such as transcription factors (TFs). However, due to the overwhelming number of single-nucleotide polymorphisms (SNPs) (>500,000) in genome-wide association studies, prioritizing variants for in vitro analysis remains challenging. In this work, we implemented a computational approach that considers support vector machine (SVM)-based TF binding site classification and cardiac expression quantitative trait loci (eQTL) analysis to identify and prioritize potential CVD-causing SNPs. We identified 1535 CVD-associated SNPs within TF footprints and putative cardiac enhancers plus 14,218 variants in linkage disequilibrium with genotype-dependent gene expression in cardiac tissues. Using ChIP-seq data from two cardiac TFs (NKX2-5 and TBX5) in human-induced pluripotent stem cell-derived cardiomyocytes, we trained a large-scale gapped k-mer SVM model to identify CVD-associated SNPs that altered NKX2-5 and TBX5 binding. The model was tested by scoring human heart TF genomic footprints within putative enhancers and measuring in vitro binding through electrophoretic mobility shift assay. Five variants predicted to alter NKX2-5 (rs59310144, rs6715570, and rs61872084) and TBX5 (rs7612445 and rs7790964) binding were prioritized for in vitro validation based on the magnitude of the predicted change in binding and are in cardiac tissue eQTLs. All five variants altered NKX2-5 and TBX5 DNA binding. We present a bioinformatic approach that considers tissue-specific eQTL analysis and SVM-based TF binding site classification to prioritize CVD-associated variants for in vitro analysis.

Keywords: DNA-binding protein; cardiovascular disease; computational biology; gene regulation; genomics; single-nucleotide polymorphism (SNP); transcription factors.

MeSH terms

  • Cardiovascular Diseases* / genetics
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study
  • Homeobox Protein Nkx-2.5 / genetics
  • Homeobox Protein Nkx-2.5 / metabolism
  • Humans
  • Myocytes, Cardiac / metabolism
  • Polymorphism, Single Nucleotide
  • Regulatory Sequences, Nucleic Acid
  • Transcription Factors / genetics
  • Transcription Factors / metabolism

Substances

  • Homeobox Protein Nkx-2.5
  • NKX2-5 protein, human
  • Transcription Factors
  • T-box transcription factor 5