Stochastic Epigenetic Mutations: Reliable Detection and Associations with Cardiovascular Aging

bioRxiv [Preprint]. 2023 Dec 13:2023.12.12.571149. doi: 10.1101/2023.12.12.571149.

Abstract

Stochastic Epigenetic Mutations (SEMs) have been proposed as novel aging biomarkers that have the potential to capture heterogeneity in age-related DNA methylation (DNAme) changes. SEMs are defined as outlier methylation patterns at cytosine-guanine dinucleotide (CpG) sites, categorized as hypermethylated (hyperSEM) or hypomethylated (hypoSEM) relative to a reference. While individual SEMs are rarely consistent across subjects, the SEM load - the total number of SEMs - increases with age. However, given poor technical reliability of measurement for many DNA methylation sites, we posited that many outliers might represent technical noise. Our study of whole blood samples from 36 individuals, each measured twice, found that 23.3% of hypoSEM and 45.6% hyperSEM are not shared between replicates. This diminishes the reliability of SEM loads, where intraclass correlation coefficients are 0.96 for hypoSEM and 0.90 for hyperSEM. We linked SEM reliability to multiple factors, including blood cell type composition, probe beta-value statistics, and presence of SNPs. A machine learning approach, leveraging these factors, filtered unreliable SEMs, enhancing reliability in a separate dataset of technical replicates from 128 individuals. Analysis of the Framingham Heart Study confirmed previously reported SEM association with mortality and revealed novel connections to cardiovascular disease. We discover that associations with aging outcomes are primarily driven by hypoSEMs at baseline methylated probes and hyperSEMs at baseline unmethylated probes, which are the same subsets that demonstrate highest technical reliability. These aging associations are preserved after filtering out unreliable SEMs and are enhanced after adjusting for blood cell composition. Finally, we utilize these insights to formulate best practices for SEM detection and introduce a novel R package, SEMdetectR, which utilizes parallel programming for efficient SEM detection with comprehensive options for detection, filtering, and analysis.

Keywords: DNA methylation; aging; biomarker; heterogeneity; reliability; software.

Publication types

  • Preprint