An integrated single-cell transcriptomic dataset for non-small cell lung cancer

Sci Data. 2023 Mar 27;10(1):167. doi: 10.1038/s41597-023-02074-6.

Abstract

As single-cell RNA sequencing (scRNA-seq) has emerged as a great tool for studying cellular heterogeneity within the past decade, the number of available scRNA-seq datasets also rapidly increased. However, reuse of such data is often problematic due to a small cohort size, limited cell types, and insufficient information on cell type classification. Here, we present a large integrated scRNA-seq dataset containing 224,611 cells from human primary non-small cell lung cancer (NSCLC) tumors. Using publicly available resources, we pre-processed and integrated seven independent scRNA-seq datasets using an anchor-based approach, with five datasets utilized as reference and the remaining two, as validation. We created two levels of annotation based on cell type-specific markers conserved across the datasets. To demonstrate usability of the integrated dataset, we created annotation predictions for the two validation datasets using our integrated reference. Additionally, we conducted a trajectory analysis on subsets of T cells and lung cancer cells. This integrated data may serve as a resource for studying NSCLC transcriptome at the single cell level.

Publication types

  • Dataset

MeSH terms

  • Carcinoma, Non-Small-Cell Lung* / genetics
  • Gene Expression Profiling
  • Humans
  • Lung Neoplasms* / genetics
  • Sequence Analysis, RNA
  • Single-Cell Gene Expression Analysis*
  • Software
  • Transcriptome