A robust and scalable graph neural network for accurate single-cell classification

Brief Bioinform. 2022 Mar 10;23(2):bbab570. doi: 10.1093/bib/bbab570.

Abstract

Single-cell RNA sequencing (scRNA-seq) techniques provide high-resolution data on cellular heterogeneity in diverse tissues, and a critical step for the data analysis is cell type identification. Traditional methods usually cluster the cells and manually identify cell clusters through marker genes, which is time-consuming and subjective. With the launch of several large-scale single-cell projects, millions of sequenced cells have been annotated and it is promising to transfer labels from the annotated datasets to newly generated datasets. One powerful way for the transferring is to learn cell relations through the graph neural network (GNN), but traditional GNNs are difficult to process millions of cells due to the expensive costs of the message-passing procedure at each training epoch. Here, we have developed a robust and scalable GNN-based method for accurate single-cell classification (GraphCS), where the graph is constructed to connect similar cells within and between labelled and unlabeled scRNA-seq datasets for propagation of shared information. To overcome the slow information propagation of GNN at each training epoch, the diffused information is pre-calculated via the approximate Generalized PageRank algorithm, enabling sublinear complexity over cell numbers. Compared with existing methods, GraphCS demonstrates better performance on simulated, cross-platform, cross-species and cross-omics scRNA-seq datasets. More importantly, our model provides a high speed and scalability on large datasets, and can achieve superior performance for 1 million cells within 50 min.

Keywords: batch effects; scalable graph neural network; single-cell RNA sequencing; single-cell classification; virtual adversarial training.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Exome Sequencing
  • Learning
  • Neural Networks, Computer*
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods