A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data

BMC Genomics. 2016 Dec 22;17(Suppl 13):1025. doi: 10.1186/s12864-016-3317-7.

Abstract

Background: The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)).

Results: Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases.

Conclusion: This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.

Keywords: Machine learning; Network reconstruction; Single-cell RNA-seq; Systems biology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Biomarkers
  • Brain / embryology
  • Brain / growth & development
  • Brain / metabolism*
  • Gene Expression Profiling*
  • Machine Learning*
  • Models, Statistical
  • Neurogenesis / genetics
  • Organ Specificity
  • Organogenesis / genetics*
  • Reproducibility of Results
  • Single-Cell Analysis* / methods
  • Support Vector Machine
  • Transcriptome*

Substances

  • Biomarkers