Recall DNA methylation levels at low coverage sites using a CNN model in WGBS

PLoS Comput Biol. 2023 Jun 14;19(6):e1011205. doi: 10.1371/journal.pcbi.1011205. eCollection 2023 Jun.

Abstract

DNA methylation is an important regulator of gene transcription. WGBS is the gold-standard approach for base-pair resolution quantitative of DNA methylation. It requires high sequencing depth. Many CpG sites with insufficient coverage in the WGBS data, resulting in inaccurate DNA methylation levels of individual sites. Many state-of-arts computation methods were proposed to predict the missing value. However, many methods required either other omics datasets or other cross-sample data. And most of them only predicted the state of DNA methylation. In this study, we proposed the RcWGBS, which can impute the missing (or low coverage) values from the DNA methylation levels on the adjacent sides. Deep learning techniques were employed for the accurate prediction. The WGBS datasets of H1-hESC and GM12878 were down-sampled. The average difference between the DNA methylation level at 12× depth predicted by RcWGBS and that at >50× depth in the H1-hESC and GM2878 cells are less than 0.03 and 0.01, respectively. RcWGBS performed better than METHimpute even though the sequencing depth was as low as 12×. Our work would help to process methylation data of low sequencing depth. It is beneficial for researchers to save sequencing costs and improve data utilization through computational methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA Methylation* / genetics
  • Human Embryonic Stem Cells*
  • Humans
  • Mental Recall
  • Protein Processing, Post-Translational
  • Research Personnel

Grants and funding

The work was supported in part by the National Natural Science Foundation of China (62250028, 62131004, to Q.Z.; 62202315 to X.L.), the Sichuan Provincial Science Fund for Distinguished Young Scholars (2021JDJQ0025 to Q.Z.), the Municipal Government of Quzhou (2022D040 to Q.Z.), the China Postdoctoral Science Foundation (2022M720662 to X.L.), the Foundation Project of Shenzhen Polytechnic (6022330002K to X.L.) and the Special Project in Key Field of Department of Education of Guangdong Province (2022ZDZX2082 to L.X.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.