A Deep-Learning Pipeline for TSS Coverage Imputation From Shallow Cell-Free DNA Sequencing

Front Med (Lausanne). 2021 Dec 3:8:684238. doi: 10.3389/fmed.2021.684238. eCollection 2021.

Abstract

Cell-free DNA (cfDNA) serves as a footprint of the nucleosome occupancy status of transcription start sites (TSSs), and has been subject to wide development for use in noninvasive health monitoring and disease detection. However, the requirement for high sequencing depth limits its clinical use. Here, we introduce a deep-learning pipeline designed for TSS coverage profiles generated from shallow cfDNA sequencing called the Autoencoder of cfDNA TSS (AECT) coverage profile. AECT outperformed existing single-cell sequencing imputation algorithms in terms of improvements to TSS coverage accuracy and the capture of latent biological features that distinguish sex or tumor status. We built classifiers for the detection of breast and rectal cancer using AECT-imputed shallow sequencing data, and their performance was close to that achieved by high-depth sequencing, suggesting that AECT could provide a broadly applicable noninvasive screening approach with high accuracy and at a moderate cost.

Keywords: autoencoder; cell-free DNA; deep learning; nucleosome footprint; whole-genome sequencing.