CoT: a transformer-based method for inferring tumor clonal copy number substructure from scDNA-seq data

Brief Bioinform. 2024 Mar 27;25(3):bbae187. doi: 10.1093/bib/bbae187.

Abstract

Single-cell DNA sequencing (scDNA-seq) has been an effective means to unscramble intra-tumor heterogeneity, while joint inference of tumor clones and their respective copy number profiles remains a challenging task due to the noisy nature of scDNA-seq data. We introduce a new bioinformatics method called CoT for deciphering clonal copy number substructure. The backbone of CoT is a Copy number Transformer autoencoder that leverages multi-head attention mechanism to explore correlations between different genomic regions, and thus capture global features to create latent embeddings for the cells. CoT makes it convenient to first infer cell subpopulations based on the learned embeddings, and then estimate single-cell copy numbers through joint analysis of read counts data for the cells belonging to the same cluster. This exploitation of clonal substructure information in copy number analysis helps to alleviate the effect of read counts non-uniformity, and yield robust estimations of the tumor copy numbers. Performance evaluation on synthetic and real datasets showcases that CoT outperforms the state of the arts, and is highly useful for deciphering clonal copy number substructure.

Keywords: copy number alteration; deep learning; intra-tumor heterogeneity; single-cell sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology* / methods
  • DNA Copy Number Variations*
  • Humans
  • Neoplasms* / genetics
  • Sequence Analysis, DNA / methods
  • Single-Cell Analysis* / methods