scAuto as a comprehensive framework for single-cell chromatin accessibility data analysis

Comput Biol Med. 2024 Mar:171:108230. doi: 10.1016/j.compbiomed.2024.108230. Epub 2024 Feb 29.

Abstract

Interpreting single-cell chromatin accessibility data is crucial for understanding intercellular heterogeneity regulation. Despite the progress in computational methods for analyzing this data, there is still a lack of a comprehensive analytical framework and a user-friendly online analysis tool. To fill this gap, we developed a pre-trained deep learning-based framework, single-cell auto-correlation transformers (scAuto), to overcome the challenge. Following DNABERT's methodology of pre-training and fine-tuning, scAuto learns a general understanding of DNA sequence's grammar by being pre-trained on unlabeled human genome via self-supervision; it is then transferred to the single-cell chromatin accessibility analysis task of scATAC-seq data for supervised fine-tuning. We extensively validated scAuto on the Buenrostro2018 dataset, demonstrating its superior performance on chromatin accessibility prediction, single-cell clustering, and data denoising. Based on scAuto, we further developed an interactive web server for single-cell chromatin accessibility data analysis. It integrates tutorial-style interfaces for those with limited programming skills. The platform is accessible at http://zhanglab.icaup.cn. To our knowledge, this work is expected to help analyze single-cell chromatin accessibility data and facilitate the development of precision medicine.

Keywords: Chromatin accessibility; Data analysis tools; Deep learning; Single-cell genomics; Web server.

MeSH terms

  • Chromatin*
  • DNA*
  • Data Analysis
  • Genome, Human
  • Humans
  • Sequence Analysis, DNA

Substances

  • Chromatin
  • DNA