Study on detection of CNVs using human whole genome bisulfite sequencing data

Yi Chuan. 2023 Apr 20;45(4):324-340. doi: 10.16288/j.yczz.22-385.

Abstract

It has been reported that the aberrant DNA methylation may result in copy number variations (CNVs), and the CNVs may alter the levels of DNA methylation. Whole genome bisulfite sequencing (WGBS) is able to generate the sequencing data of DNAs, and shows the potential ability to detect CNVs. However, the evaluations and performances on the detections of CNVs using WGBS data is still unclear. In this study, five software with different strategies for CNV detections, e.g., BreakDancer, cn.mops, CNVnator, DELLY and Pindel, were selected to explore and benchmark the performances of CNV detections with WGBS data. Based on the real (2.62 billion reads) and simulated (12.35 billion reads) WGBS data of humans, we calculated the number, precision, recall, relative ability, memory usage, and running time of CNV detections by 150 times, and tried to figure out the optimal strategy for CNV detections with WGBS data. Based on the real WGBS data, Pindel detected the most deletions and duplications, CNVnator detected the deletions with the highest precision, cn.mops detected the duplications with the highest precision, Pindel detected the deletions with the highest recall, and cn.mops detected the duplications with the highest recall. Based on the simulated WGBS data, BreakDancer detected the most deletions, and cn.mops detected the most duplications. The CNVnator showed the highest precision and recall for both deletions and duplications. In real and simulated WGBS data, the ability of CNVnator to detect CNVs was likely to overtake that in the whole genome sequencing data. Additionally, DELLY and BreakDancer displayed the lowest peak of memory usage and the lest CPU runtime, while CNVnator expressed the highest peak of memory usage and the most CPU runtime. Taken together, CNVnator and cn.mops showed the excellent performances of CNV detections with WGBS data. These results suggested that it was feasible to detect CNVs using WGBS data, and provided the useful information to further investigate both CNVs and DNA methylation using WGBS data alone.

DNA甲基化异常可能导致拷贝数变异(copy number variants,CNVs)的发生,而CNVs的发生又可能改变DNA甲基化水平。全基因组亚硫酸氢盐测序(whole genome bisulfite sequencing,WGBS)技术能够获得DNA水平的测序数据,具有挖掘CNVs的潜力和优势,但利用WGBS数据挖掘CNVs的效果尚不清楚。本研究选取了5款检测CNVs不同策略的软件(BreakDancer、cn.mops、CNVnator、DELLY、Pindel),基于人类的真实(2.62 billion reads)和模拟(12.35 billion reads)测序数据,进行150次CNVs检测,评估CNVs检出数量、精确率、召回率、相对检出能力、内存占用和运行时间等指标,旨在讨论利用WGBS数据检测CNVs的最佳方案。基于真实WGBS数据,Pindel检出缺失型和重复型CNVs的数量最多,CNVnator对缺失型CNVs的检测精确率最高,cn.mops对重复型CNVs的检测精确率最高,Pindel对缺失型CNVs的召回率最高,cn.mops对重复型CNVs的召回率最高。基于模拟WGBS数据,BreakDancer检出缺失型CNVs数量最多,cn.mops检出重复型CNVs数量最多,CNVnator对缺失型和重复型CNVs的检测精确率和召回率均为最高。与全基因组测序数据相比,CNVnator在真实和模拟WGBS数据中检出CNVs的能力与之相当。此外,DELLY和BreakDancer的内存占用峰值和CPU运行时间最小,CNVnator的内存占用峰值和CPU运行时间最大。结果表明,利用WGBS数据检测CNVs具有可行性,使用CNVnator和cn.mops在WGBS数据上检测CNVs的准确率较高,这些工作为利用WGBS数据深入研究CNVs和DNA甲基化之间的相互关系提供一定的参考和帮助。.

Keywords: copy number variation; software evaluation; whole genome bisulfite sequencing.

MeSH terms

  • DNA Copy Number Variations*
  • Genome, Human*
  • Humans
  • Whole Genome Sequencing

Substances

  • hydrogen sulfite