scATAC-seq preprocessing and imputation evaluation system for visualization, clustering and digital footprinting

Brief Bioinform. 2023 Nov 22;25(1):bbad447. doi: 10.1093/bib/bbad447.

Abstract

Single-cell ATAC-seq (scATAC-seq) is a recently developed approach that provides means to investigate open chromatin at single cell level, to assess epigenetic regulation and transcription factors binding landscapes. The sparsity of the scATAC-seq data calls for imputation. Similarly, preprocessing (filtering) may be required to reduce computational load due to the large number of open regions. However, optimal strategies for both imputation and preprocessing have not been yet evaluated together. We present SAPIEnS (scATAC-seq Preprocessing and Imputation Evaluation System), a benchmark for scATAC-seq imputation frameworks, a combination of state-of-the-art imputation methods with commonly used preprocessing techniques. We assess different types of scATAC-seq analysis, i.e. clustering, visualization and digital genomic footprinting, and attain optimal preprocessing-imputation strategies. We discuss the benefits of the imputation framework depending on the task and the number of the dataset features (peaks). We conclude that the preprocessing with the Boruta method is beneficial for the majority of tasks, while imputation is helpful mostly for small datasets. We also implement a SAPIEnS database with pre-computed transcription factor footprints based on imputed data with their activity scores in a specific cell type. SAPIEnS is published at: https://github.com/lab-medvedeva/SAPIEnS. SAPIEnS database is available at: https://sapiensdb.com.

Keywords: digital footprinting; imputation; preprocessing; scATAC-seq; single cell.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Epigenesis, Genetic*
  • Gene Expression Regulation
  • Genomics* / methods
  • Transcription Factors / genetics
  • Transcription Factors / metabolism

Substances

  • Transcription Factors