A novel approach toward optimal workflow selection for DNA methylation biomarker discovery

BMC Bioinformatics. 2024 Jan 23;25(1):37. doi: 10.1186/s12859-024-05658-0.

Abstract

DNA methylation is a major epigenetic modification involved in many physiological processes. Normal methylation patterns are disrupted in many diseases and methylation-based biomarkers have shown promise in several contexts. Marker discovery typically involves the analysis of publicly available DNA methylation data from high-throughput assays. Numerous methods for identification of differentially methylated biomarkers have been developed, making the need for best practices guidelines and context-specific analyses workflows exceedingly high. To this end, here we propose TASA, a novel method for simulating methylation array data in various scenarios. We then comprehensively assess different data analysis workflows using real and simulated data and suggest optimal start-to-finish analysis workflows. Our study demonstrates that the choice of analysis pipeline for DNA methylation-based marker discovery is crucial and different across different contexts.

Keywords: DNA methylation marker discovery; Data analysis pipeline optimization; Simulation of DNA methylation array data.

MeSH terms

  • Biomedical Research*
  • DNA Methylation*
  • Data Analysis
  • Epigenesis, Genetic
  • Workflow